# Speech-to-Text (STT) Node

## Speech-to-Text (STT) Node

### Overview

The Speech-to-Text Node enables your flow to convert audio into text using advanced AI models. This node supports **OpenAI**, **Azure OpenAI**, and **Groq** providers, making it ideal for applications requiring audio transcription, voice command processing, or content accessibility features.

Usage cost: 2 credits

### Configuration Settings

1. **Model Selection**
   * Model\*: Select from available OpenAI, Azure OpenAI, or Groq STT models
   * Note: Only OpenAI, Azure OpenAI, and Groq providers are currently supported
2. **Audio Input**
   * Audio File\*: Select an audio input from available audio variables
   * Supported Formats:
     * MP3 (.mp3)
     * MP4 (.mp4)
     * MPEG (.mpeg, .mpga)
     * M4A (.m4a)
     * WAV (.wav)
     * WebM (.webm)
     * FLAC (.flac)
     * OGG (.ogg, .oga)
3. **Transcription Settings**
   * Language: Optional language specification
     * Auto-detect (default)
     * 60+ supported languages including English, Spanish, French, German, etc.
   * Temperature: Controls transcription variability (0.0 to 1.0)
     * Lower values (0.0): More focused and deterministic
     * Higher values (1.0): More variety in word choice

### Outputs

* `text` (string): Transcribed text from the audio input

### Best Practices

1. **Audio Preparation**
   * Use clear, high-quality audio recordings
   * Minimize background noise and interference
   * Ensure proper audio format compatibility
   * Keep file sizes reasonable for processing
2. **Language Selection**
   * Use auto-detect for general transcription
   * Specify language for:
     * Improved accuracy with known language
     * Regional accent consideration
   * Consider target audience when selecting language
3. **Temperature Optimization**
   * Use 0.0 for:
     * Technical content
     * Legal documents
     * Precise transcriptions
   * Use higher values for:
     * Creative content
     * Casual conversations
     * Multiple interpretation scenarios
4. **Format Selection**
   * Use MP3 for:
     * General-purpose transcription
     * Balanced quality and file size
     * Wide compatibility
   * Use WAV or FLAC for:
     * High-fidelity requirements
     * Professional audio
     * Critical accuracy needs

### Common Issues

1. **File Format Issues**
   * Unsupported audio formats
   * Corrupted audio files
   * Invalid file extensions
2. **Transcription Quality**
   * Poor audio quality affecting accuracy
   * Background noise interference
   * Multiple speakers overlap
   * Heavy accents or dialects
3. **API Limitations**
   * File size restrictions
   * Rate limiting
