Waterflai
  • Welcome to Waterflai
  • Getting Started
    • Concepts
    • Quickstart
  • Providers
    • Providers Overview
    • Providers setup
    • AI models
    • Choose the right models
  • Knowledge
    • Knowledge Overview
    • Knowledge connectors
    • Knowledge collections
  • Studio
    • Studio Overview
    • Studio Builders
      • Light Builder
      • Dream Builder
      • Workflow Builder
      • Flow components (nodes)
        • Input Node
        • Output Node
        • LLM model Node
        • Multimodal LLM Node
        • Dall-E 2 (image generation) Node
        • Dall-E 3 (image generation) Node
        • Sora video generation Node
        • Text-to-Speech (TTS) Node
        • Speech-to-Text (STT) Node
        • OCR Node
        • Agent Node
        • Reranker Node
        • Knowledge retrieval Node
        • Vector store insert Node
        • Vector store record delete Node
        • Gitbook loader
        • Notion Database Node
        • Figma Node
        • Webpage scraper Node
        • Sitemap Scraper Node
        • API Request Node
        • Document metadata extraction Node
        • Document metadata update Node
        • Character splitter Node
        • HTML splitter Node
        • Markdown Splitter
        • Calculator tool Node
        • Text as tool Node
        • Knowledge retrieval tool Node
        • Conditional Node
        • Iteration loop Node
      • Testing and Debugging
    • Publishing
    • Integration with API
    • Embedding in website
  • Analytics
    • Analytics Overview
    • Dashboards
    • Logs
  • Administration
    • Organization users
    • Workspace
    • Security and permissions
  • Troubleshooting
    • Support
Powered by GitBook
On this page
  • Overview
  • Configuration
  • Best Practices
  • Common Issues
  1. Studio
  2. Studio Builders
  3. Flow components (nodes)

Character splitter Node

Overview

The Character Splitter Node splits text or documents into smaller chunks using a recursive character-based approach. It's particularly useful for preparing text for LLMs that have token limits or when you need to process long documents in smaller segments.

Usage cost: 1 credit

Configuration

Settings

  1. Chunk Configuration

    • Chunk Size*: Number of characters per chunk

    • Chunk Overlap*: Number of overlapping characters between chunks

    • Separators: List of strings that define where to split the text (comma-separated)

  2. Input Selection

    • Documents/Text to Split*: Select one or more inputs to process

    • Supports:

      • Single text strings

      • Document objects

      • Arrays of text strings

      • Arrays of documents

Output Ports

  • split_documents (Document[]): Array of split documents

    • Each document maintains original metadata

    • Chunks respect natural text boundaries based on separators

Best Practices

  1. Chunk Size Selection

    • Consider model token limits

    • Balance information density

    • Account for desired context window

    • Test with representative content

  2. Overlap Configuration

    • Use overlap to maintain context

    • Consider semantic boundaries

    • Avoid too large overlaps (waste)

  3. Separator Usage

    • Use natural text boundaries

    • Consider document structure

    • Common separators: "\n\n", "\n", ".", "!", "?"

    • Order separators from specific to general

Common Issues

  • Memory issues with large documents

  • Inconsistent chunk sizes

  • Improper handling of special characters

PreviousDocument metadata update NodeNextHTML splitter Node

Last updated 3 months ago