Waterflai
  • Welcome to Waterflai
  • Getting Started
    • Concepts
    • Quickstart
  • Providers
    • Providers Overview
    • Providers setup
    • AI models
    • Choose the right models
  • Knowledge
    • Knowledge Overview
    • Knowledge connectors
    • Knowledge collections
  • Studio
    • Studio Overview
    • Studio Builders
      • Light Builder
      • Dream Builder
      • Workflow Builder
      • Flow components (nodes)
        • Input Node
        • Output Node
        • LLM model Node
        • Multimodal LLM Node
        • Dall-E 2 (image generation) Node
        • Dall-E 3 (image generation) Node
        • Sora video generation Node
        • Text-to-Speech (TTS) Node
        • Speech-to-Text (STT) Node
        • OCR Node
        • Agent Node
        • Reranker Node
        • Knowledge retrieval Node
        • Vector store insert Node
        • Vector store record delete Node
        • Gitbook loader
        • Notion Database Node
        • Figma Node
        • Webpage scraper Node
        • Sitemap Scraper Node
        • API Request Node
        • Document metadata extraction Node
        • Document metadata update Node
        • Character splitter Node
        • HTML splitter Node
        • Markdown Splitter
        • Calculator tool Node
        • Text as tool Node
        • Knowledge retrieval tool Node
        • Conditional Node
        • Iteration loop Node
      • Testing and Debugging
    • Publishing
    • Integration with API
    • Embedding in website
  • Analytics
    • Analytics Overview
    • Dashboards
    • Logs
  • Administration
    • Organization users
    • Workspace
    • Security and permissions
  • Troubleshooting
    • Support
Powered by GitBook
On this page
  • Overview
  • Configuration
  • Best Practices
  • Common Issues
  1. Studio
  2. Studio Builders
  3. Flow components (nodes)

HTML splitter Node

Overview

The HTML Splitter Node divides HTML content into segments based on header tags (h1, h2, etc.). It's particularly useful for structuring content hierarchically and maintaining the semantic relationship between different sections of HTML documents.

Usage cost: 1 credit

Configuration

Settings

  1. Documents Selection

    • Documents to Split*: Select input content to process

    • Supports:

      • HTML strings

      • Document objects with HTML content

      • Arrays of HTML content

  2. Header Configuration

    • Headers to Split On: Define header tags and their metadata keys

      • Header Tag: HTML tag to split on (e.g., h1, h2)

      • Metadata Key: Key used to store header content (e.g. Header 1, Header 2)

    • Return Each Element: Toggle to control output granularity

      • When enabled: Returns each element with associated headers

      • When disabled: Groups content between headers

Output Ports

  • split_documents (Document[]): Array of split documents

    • Each document contains the content between headers

    • Metadata includes hierarchical header information

    • Maintains original document metadata

Best Practices

  1. Header Selection

    • Choose appropriate header levels

    • Maintain logical hierarchy

    • Use consistent header structure

  2. Metadata Keys

    • Use descriptive key names

    • Follow consistent naming convention

    • Consider hierarchical relationships

Common Issues

  • Inconsistent HTML structure

  • Missing header tags

  • Invalid HTML formatting

PreviousCharacter splitter NodeNextMarkdown Splitter

Last updated 3 months ago