Markdown Splitter

Overview

The Markdown Splitter Node divides Markdown content into segments based on header levels. It enables structured splitting of Markdown documents while preserving the hierarchical relationship between sections, making it ideal for processing documentation, articles, and other Markdown-formatted content.

Usage cost: 1 credit

Configuration

Settings

  1. Documents Selection

    • Documents to Split*: Select input content to process

    • Supports:

      • Markdown strings

      • Document objects with Markdown content

      • Arrays of Markdown content

  2. Header Configuration

    • Headers to Split On: Define header syntax and metadata keys

      • Header Syntax: Markdown header symbols (e.g., #, ##)

      • Metadata Key: Key used to store header content (e.g., Header 1, Header 2)

  3. Processing Options

    • Return Each Line: Split content line by line

      • When enabled: Each line becomes a separate document

      • When disabled: Groups content between headers

    • Strip Headers: Remove header syntax from content

      • When enabled: Headers removed from output content

      • When disabled: Headers preserved in content

Output Ports

  • split_documents (Document[]): Array of split documents

    • Each document contains sectioned content

    • Metadata includes header information

    • Preserves original document metadata

Best Practices

  1. Header Definition

    • Use consistent header levels

    • Start with highest level needed

    • Maintain logical hierarchy

    • Use clear metadata keys

  2. Content Processing

    • Consider document structure

    • Plan metadata organization

    • Test with sample content

  3. Options Selection

    • Use Return Each Line for granular analysis

    • Enable Strip Headers for clean content

    • Consider downstream processing needs

    • Balance granularity vs. context

Common Issues

  • Inconsistent header formatting

  • Missing header levels

  • Special character handling

Last updated