# Text-to-Speech (TTS) Node

### Overview

The Text-to-Speech Node enables your flow to convert text into natural-sounding speech using advanced AI models. This node supports both **OpenAI** and **Azure OpenAI** providers, making it ideal for applications requiring voice synthesis, audio content creation, or accessibility features.

Usage cost: 2 credit

### Configuration Settings

1. **Model Selection**
   * Model\*: Select from available OpenAI or Azure OpenAI TTS models
   * Note: Only OpenAI and Azure OpenAI providers are currently supported
2. **Voice Configuration**
   * Input Text\*: The text to convert to speech (supports variable interpolation)
   * Voice\*: Select from available voice options:
     * Alloy: Neutral, versatile voice
     * Echo: Deep, resonant voice
     * Fable: Warm, narrative-focused voice
     * Onyx: Authoritative, professional voice
     * Nova: Energetic, youthful voice
     * Shimmer: Clear, bright voice
3. **Audio Settings**
   * Audio Format\*: Choose output format:
     * MP3: Standard compressed audio
     * Opus: High-quality compressed format
     * AAC: Advanced Audio Coding
     * FLAC: Lossless audio compression
     * WAV: Uncompressed audio
     * PCM: Raw audio data
   * Speed: Adjust speech rate (0.25x to 4.0x, default: 1.0x)

### Outputs

* `audio` (Audio): Audio object for use in subsequent nodes
* `base64_audio` (string): Base64-encoded audio data with format prefix (e.g., "data:audio/mp3;base64,...")

### Best Practices

1. **Text Preparation**
   * Keep sentences clear and well-punctuated
   * Use appropriate breaks and pauses for natural speech flow
   * Consider phonetic spelling for complex words or names
   * Test with smaller text segments before processing long content
2. **Voice Selection**
   * Choose voices based on your content's tone and purpose:
     * Alloy: General-purpose applications
     * Echo: Narratives requiring authority
     * Fable: Storytelling and educational content
     * Onyx: Business and professional applications
     * Nova: Dynamic, engaging content
     * Shimmer: Clear instructional content
3. **Format Selection**
   * Choose MP3 for:
     * General-purpose use
     * Web streaming
     * Smaller file sizes
   * Choose FLAC or WAV for:
     * High-quality requirements
     * Professional audio production
     * Further audio processing
   * Choose Opus for:
     * Real-time applications
     * Efficient streaming
     * Low-latency requirements

### Common Issues

1. **API Limitations**
   * Rate limiting from providers
   * Maximum text length restrictions
2. **Audio Quality**
   * Format compatibility with target platforms
   * Speech clarity in complex sentences
