HTML splitter Node

Overview

The HTML Splitter Node divides HTML content into segments based on header tags (h1, h2, etc.). It's particularly useful for structuring content hierarchically and maintaining the semantic relationship between different sections of HTML documents.

Usage cost: 1 credit

Configuration

Settings

  1. Documents Selection

    • Documents to Split*: Select input content to process

    • Supports:

      • HTML strings

      • Document objects with HTML content

      • Arrays of HTML content

  2. Header Configuration

    • Headers to Split On: Define header tags and their metadata keys

      • Header Tag: HTML tag to split on (e.g., h1, h2)

      • Metadata Key: Key used to store header content (e.g. Header 1, Header 2)

    • Return Each Element: Toggle to control output granularity

      • When enabled: Returns each element with associated headers

      • When disabled: Groups content between headers

Output Ports

  • split_documents (Document[]): Array of split documents

    • Each document contains the content between headers

    • Metadata includes hierarchical header information

    • Maintains original document metadata

Best Practices

  1. Header Selection

    • Choose appropriate header levels

    • Maintain logical hierarchy

    • Use consistent header structure

  2. Metadata Keys

    • Use descriptive key names

    • Follow consistent naming convention

    • Consider hierarchical relationships

Common Issues

  • Inconsistent HTML structure

  • Missing header tags

  • Invalid HTML formatting

Last updated