HTML splitter Node

Overview

The HTML Splitter Node divides HTML content into segments based on header tags (h1, h2, etc.). It's particularly useful for structuring content hierarchically and maintaining the semantic relationship between different sections of HTML documents.

Usage cost: 1 credit

Configuration

Settings

Documents Selection
- Documents to Split*: Select input content to process
- Supports:
  - HTML strings
  - Document objects with HTML content
  - Arrays of HTML content
Header Configuration
- Headers to Split On: Define header tags and their metadata keys
  - Header Tag: HTML tag to split on (e.g., h1, h2)
  - Metadata Key: Key used to store header content (e.g. Header 1, Header 2)
- Return Each Element: Toggle to control output granularity
  - When enabled: Returns each element with associated headers
  - When disabled: Groups content between headers

Output Ports

split_documents (Document[]): Array of split documents
- Each document contains the content between headers
- Metadata includes hierarchical header information
- Maintains original document metadata

Best Practices

Header Selection
- Choose appropriate header levels
- Maintain logical hierarchy
- Use consistent header structure
Metadata Keys
- Use descriptive key names
- Follow consistent naming convention
- Consider hierarchical relationships

Common Issues

Inconsistent HTML structure
Missing header tags
Invalid HTML formatting

PreviousCharacter splitter Node NextMarkdown Splitter

Last updated 4 months ago