Character splitter Node
Overview
The Character Splitter Node splits text or documents into smaller chunks using a recursive character-based approach. It's particularly useful for preparing text for LLMs that have token limits or when you need to process long documents in smaller segments.
Usage cost: 1 credit
Configuration
Settings
Chunk Configuration
Chunk Size*: Number of characters per chunk
Chunk Overlap*: Number of overlapping characters between chunks
Separators: List of strings that define where to split the text (comma-separated)
Input Selection
Documents/Text to Split*: Select one or more inputs to process
Supports:
Single text strings
Document objects
Arrays of text strings
Arrays of documents
Output Ports
split_documents
(Document[]): Array of split documentsEach document maintains original metadata
Chunks respect natural text boundaries based on separators
Best Practices
Chunk Size Selection
Consider model token limits
Balance information density
Account for desired context window
Test with representative content
Overlap Configuration
Use overlap to maintain context
Consider semantic boundaries
Avoid too large overlaps (waste)
Separator Usage
Use natural text boundaries
Consider document structure
Common separators: "\n\n", "\n", ".", "!", "?"
Order separators from specific to general
Common Issues
Memory issues with large documents
Inconsistent chunk sizes
Improper handling of special characters
Last updated