Text-to-Speech (TTS) Node
Overview
The Text-to-Speech Node enables your flow to convert text into natural-sounding speech using advanced AI models. This node supports both OpenAI and Azure OpenAI providers, making it ideal for applications requiring voice synthesis, audio content creation, or accessibility features.
Usage cost: 2 credit
Configuration Settings
Model Selection
Model*: Select from available OpenAI or Azure OpenAI TTS models
Note: Only OpenAI and Azure OpenAI providers are currently supported
Voice Configuration
Input Text*: The text to convert to speech (supports variable interpolation)
Voice*: Select from available voice options:
Alloy: Neutral, versatile voice
Echo: Deep, resonant voice
Fable: Warm, narrative-focused voice
Onyx: Authoritative, professional voice
Nova: Energetic, youthful voice
Shimmer: Clear, bright voice
Audio Settings
Audio Format*: Choose output format:
MP3: Standard compressed audio
Opus: High-quality compressed format
AAC: Advanced Audio Coding
FLAC: Lossless audio compression
WAV: Uncompressed audio
PCM: Raw audio data
Speed: Adjust speech rate (0.25x to 4.0x, default: 1.0x)
Outputs
audio
(Audio): Audio object for use in subsequent nodesbase64_audio
(string): Base64-encoded audio data with format prefix (e.g., "data:audio/mp3;base64,...")
Best Practices
Text Preparation
Keep sentences clear and well-punctuated
Use appropriate breaks and pauses for natural speech flow
Consider phonetic spelling for complex words or names
Test with smaller text segments before processing long content
Voice Selection
Choose voices based on your content's tone and purpose:
Alloy: General-purpose applications
Echo: Narratives requiring authority
Fable: Storytelling and educational content
Onyx: Business and professional applications
Nova: Dynamic, engaging content
Shimmer: Clear instructional content
Format Selection
Choose MP3 for:
General-purpose use
Web streaming
Smaller file sizes
Choose FLAC or WAV for:
High-quality requirements
Professional audio production
Further audio processing
Choose Opus for:
Real-time applications
Efficient streaming
Low-latency requirements
Common Issues
API Limitations
Rate limiting from providers
Maximum text length restrictions
Audio Quality
Format compatibility with target platforms
Speech clarity in complex sentences
Last updated