Waterflai
  • Welcome to Waterflai
  • Getting Started
    • Concepts
    • Quickstart
  • Providers
    • Providers Overview
    • Providers setup
    • AI models
    • Choose the right models
  • Knowledge
    • Knowledge Overview
    • Knowledge connectors
    • Knowledge collections
  • Studio
    • Studio Overview
    • Studio Builders
      • Light Builder
      • Dream Builder
      • Workflow Builder
      • Flow components (nodes)
        • Input Node
        • Output Node
        • LLM model Node
        • Multimodal LLM Node
        • Dall-E 2 (image generation) Node
        • Dall-E 3 (image generation) Node
        • Sora video generation Node
        • Text-to-Speech (TTS) Node
        • Speech-to-Text (STT) Node
        • OCR Node
        • Agent Node
        • Reranker Node
        • Knowledge retrieval Node
        • Vector store insert Node
        • Vector store record delete Node
        • Gitbook loader
        • Notion Database Node
        • Figma Node
        • Webpage scraper Node
        • Sitemap Scraper Node
        • API Request Node
        • Document metadata extraction Node
        • Document metadata update Node
        • Character splitter Node
        • HTML splitter Node
        • Markdown Splitter
        • Calculator tool Node
        • Text as tool Node
        • Knowledge retrieval tool Node
        • Conditional Node
        • Iteration loop Node
      • Testing and Debugging
    • Publishing
    • Integration with API
    • Embedding in website
  • Analytics
    • Analytics Overview
    • Dashboards
    • Logs
  • Administration
    • Organization users
    • Workspace
    • Security and permissions
  • Troubleshooting
    • Support
Powered by GitBook
On this page
  • Overview
  • Configuration Settings
  • Outputs
  • Best Practices
  • Common Issues
  1. Studio
  2. Studio Builders
  3. Flow components (nodes)

Text-to-Speech (TTS) Node

Overview

The Text-to-Speech Node enables your flow to convert text into natural-sounding speech using advanced AI models. This node supports both OpenAI and Azure OpenAI providers, making it ideal for applications requiring voice synthesis, audio content creation, or accessibility features.

Usage cost: 2 credit

Configuration Settings

  1. Model Selection

    • Model*: Select from available OpenAI or Azure OpenAI TTS models

    • Note: Only OpenAI and Azure OpenAI providers are currently supported

  2. Voice Configuration

    • Input Text*: The text to convert to speech (supports variable interpolation)

    • Voice*: Select from available voice options:

      • Alloy: Neutral, versatile voice

      • Echo: Deep, resonant voice

      • Fable: Warm, narrative-focused voice

      • Onyx: Authoritative, professional voice

      • Nova: Energetic, youthful voice

      • Shimmer: Clear, bright voice

  3. Audio Settings

    • Audio Format*: Choose output format:

      • MP3: Standard compressed audio

      • Opus: High-quality compressed format

      • AAC: Advanced Audio Coding

      • FLAC: Lossless audio compression

      • WAV: Uncompressed audio

      • PCM: Raw audio data

    • Speed: Adjust speech rate (0.25x to 4.0x, default: 1.0x)

Outputs

  • audio (Audio): Audio object for use in subsequent nodes

  • base64_audio (string): Base64-encoded audio data with format prefix (e.g., "data:audio/mp3;base64,...")

Best Practices

  1. Text Preparation

    • Keep sentences clear and well-punctuated

    • Use appropriate breaks and pauses for natural speech flow

    • Consider phonetic spelling for complex words or names

    • Test with smaller text segments before processing long content

  2. Voice Selection

    • Choose voices based on your content's tone and purpose:

      • Alloy: General-purpose applications

      • Echo: Narratives requiring authority

      • Fable: Storytelling and educational content

      • Onyx: Business and professional applications

      • Nova: Dynamic, engaging content

      • Shimmer: Clear instructional content

  3. Format Selection

    • Choose MP3 for:

      • General-purpose use

      • Web streaming

      • Smaller file sizes

    • Choose FLAC or WAV for:

      • High-quality requirements

      • Professional audio production

      • Further audio processing

    • Choose Opus for:

      • Real-time applications

      • Efficient streaming

      • Low-latency requirements

Common Issues

  1. API Limitations

    • Rate limiting from providers

    • Maximum text length restrictions

  2. Audio Quality

    • Format compatibility with target platforms

    • Speech clarity in complex sentences

PreviousSora video generation NodeNextSpeech-to-Text (STT) Node

Last updated 3 months ago