Speech-to-Text (STT) Node

Overview

The Speech-to-Text Node enables your flow to convert audio into text using advanced AI models. This node supports OpenAI, Azure OpenAI, and Groq providers, making it ideal for applications requiring audio transcription, voice command processing, or content accessibility features.

Usage cost: 2 credits

Configuration Settings

Model Selection
- Model*: Select from available OpenAI, Azure OpenAI, or Groq STT models
- Note: Only OpenAI, Azure OpenAI, and Groq providers are currently supported
Audio Input
- Audio File*: Select an audio input from available audio variables
- Supported Formats:
  - MP3 (.mp3)
  - MP4 (.mp4)
  - MPEG (.mpeg, .mpga)
  - M4A (.m4a)
  - WAV (.wav)
  - WebM (.webm)
  - FLAC (.flac)
  - OGG (.ogg, .oga)
Transcription Settings
- Language: Optional language specification
  - Auto-detect (default)
  - 60+ supported languages including English, Spanish, French, German, etc.
- Temperature: Controls transcription variability (0.0 to 1.0)
  - Lower values (0.0): More focused and deterministic
  - Higher values (1.0): More variety in word choice

Outputs

text (string): Transcribed text from the audio input

Best Practices

Audio Preparation
- Use clear, high-quality audio recordings
- Minimize background noise and interference
- Ensure proper audio format compatibility
- Keep file sizes reasonable for processing
Language Selection
- Use auto-detect for general transcription
- Specify language for:
  - Improved accuracy with known language
  - Regional accent consideration
- Consider target audience when selecting language
Temperature Optimization
- Use 0.0 for:
  - Technical content
  - Legal documents
  - Precise transcriptions
- Use higher values for:
  - Creative content
  - Casual conversations
  - Multiple interpretation scenarios
Format Selection
- Use MP3 for:
  - General-purpose transcription
  - Balanced quality and file size
  - Wide compatibility
- Use WAV or FLAC for:
  - High-fidelity requirements
  - Professional audio
  - Critical accuracy needs

Common Issues

File Format Issues
- Unsupported audio formats
- Corrupted audio files
- Invalid file extensions
Transcription Quality
- Poor audio quality affecting accuracy
- Background noise interference
- Multiple speakers overlap
- Heavy accents or dialects
API Limitations
- File size restrictions
- Rate limiting

PreviousText-to-Speech (TTS) Node NextOCR Node

Last updated 8 months ago