Character splitter Node

Overview

The Character Splitter Node splits text or documents into smaller chunks using a recursive character-based approach. It's particularly useful for preparing text for LLMs that have token limits or when you need to process long documents in smaller segments.

Usage cost: 1 credit

Configuration

Settings

Chunk Configuration
- Chunk Size*: Number of characters per chunk
- Chunk Overlap*: Number of overlapping characters between chunks
- Separators: List of strings that define where to split the text (comma-separated)
Input Selection
- Documents/Text to Split*: Select one or more inputs to process
- Supports:
  - Single text strings
  - Document objects
  - Arrays of text strings
  - Arrays of documents

Output Ports

split_documents (Document[]): Array of split documents
- Each document maintains original metadata
- Chunks respect natural text boundaries based on separators

Best Practices

Chunk Size Selection
- Consider model token limits
- Balance information density
- Account for desired context window
- Test with representative content
Overlap Configuration
- Use overlap to maintain context
- Consider semantic boundaries
- Avoid too large overlaps (waste)
Separator Usage
- Use natural text boundaries
- Consider document structure
- Common separators: "\n\n", "\n", ".", "!", "?"
- Order separators from specific to general

Common Issues

Memory issues with large documents
Inconsistent chunk sizes
Improper handling of special characters

PreviousSubstring Extraction Node NextHTML splitter Node

Last updated 8 months ago