Text-to-Speech (TTS) Node

Overview

The Text-to-Speech Node enables your flow to convert text into natural-sounding speech using advanced AI models. This node supports both OpenAI and Azure OpenAI providers, making it ideal for applications requiring voice synthesis, audio content creation, or accessibility features.

Usage cost: 2 credit

Configuration Settings

Model Selection
- Model*: Select from available OpenAI or Azure OpenAI TTS models
- Note: Only OpenAI and Azure OpenAI providers are currently supported
Voice Configuration
- Input Text*: The text to convert to speech (supports variable interpolation)
- Voice*: Select from available voice options:
  - Alloy: Neutral, versatile voice
  - Echo: Deep, resonant voice
  - Fable: Warm, narrative-focused voice
  - Onyx: Authoritative, professional voice
  - Nova: Energetic, youthful voice
  - Shimmer: Clear, bright voice
Audio Settings
- Audio Format*: Choose output format:
  - MP3: Standard compressed audio
  - Opus: High-quality compressed format
  - AAC: Advanced Audio Coding
  - FLAC: Lossless audio compression
  - WAV: Uncompressed audio
  - PCM: Raw audio data
- Speed: Adjust speech rate (0.25x to 4.0x, default: 1.0x)

Outputs

audio (Audio): Audio object for use in subsequent nodes
base64_audio (string): Base64-encoded audio data with format prefix (e.g., "data:audio/mp3;base64,...")

Best Practices

Text Preparation
- Keep sentences clear and well-punctuated
- Use appropriate breaks and pauses for natural speech flow
- Consider phonetic spelling for complex words or names
- Test with smaller text segments before processing long content
Voice Selection
- Choose voices based on your content's tone and purpose:
  - Alloy: General-purpose applications
  - Echo: Narratives requiring authority
  - Fable: Storytelling and educational content
  - Onyx: Business and professional applications
  - Nova: Dynamic, engaging content
  - Shimmer: Clear instructional content
Format Selection
- Choose MP3 for:
  - General-purpose use
  - Web streaming
  - Smaller file sizes
- Choose FLAC or WAV for:
  - High-quality requirements
  - Professional audio production
  - Further audio processing
- Choose Opus for:
  - Real-time applications
  - Efficient streaming
  - Low-latency requirements

Common Issues

API Limitations
- Rate limiting from providers
- Maximum text length restrictions
Audio Quality
- Format compatibility with target platforms
- Speech clarity in complex sentences

PreviousSora video generation Node NextSpeech-to-Text (STT) Node

Last updated 11 months ago

hashtagOverview

hashtagConfiguration Settings

hashtagOutputs

hashtagBest Practices

hashtagCommon Issues

Overview

Configuration Settings

Outputs

Best Practices

Common Issues