Speech-to-Text (STT) Node
Speech-to-Text (STT) Node
Overview
The Speech-to-Text Node enables your flow to convert audio into text using advanced AI models. This node supports OpenAI, Azure OpenAI, and Groq providers, making it ideal for applications requiring audio transcription, voice command processing, or content accessibility features.
Usage cost: 2 credits
Configuration Settings
Model Selection
Model*: Select from available OpenAI, Azure OpenAI, or Groq STT models
Note: Only OpenAI, Azure OpenAI, and Groq providers are currently supported
Audio Input
Audio File*: Select an audio input from available audio variables
Supported Formats:
MP3 (.mp3)
MP4 (.mp4)
MPEG (.mpeg, .mpga)
M4A (.m4a)
WAV (.wav)
WebM (.webm)
FLAC (.flac)
OGG (.ogg, .oga)
Transcription Settings
Language: Optional language specification
Auto-detect (default)
60+ supported languages including English, Spanish, French, German, etc.
Temperature: Controls transcription variability (0.0 to 1.0)
Lower values (0.0): More focused and deterministic
Higher values (1.0): More variety in word choice
Outputs
text
(string): Transcribed text from the audio input
Best Practices
Audio Preparation
Use clear, high-quality audio recordings
Minimize background noise and interference
Ensure proper audio format compatibility
Keep file sizes reasonable for processing
Language Selection
Use auto-detect for general transcription
Specify language for:
Improved accuracy with known language
Regional accent consideration
Consider target audience when selecting language
Temperature Optimization
Use 0.0 for:
Technical content
Legal documents
Precise transcriptions
Use higher values for:
Creative content
Casual conversations
Multiple interpretation scenarios
Format Selection
Use MP3 for:
General-purpose transcription
Balanced quality and file size
Wide compatibility
Use WAV or FLAC for:
High-fidelity requirements
Professional audio
Critical accuracy needs
Common Issues
File Format Issues
Unsupported audio formats
Corrupted audio files
Invalid file extensions
Transcription Quality
Poor audio quality affecting accuracy
Background noise interference
Multiple speakers overlap
Heavy accents or dialects
API Limitations
File size restrictions
Rate limiting
Last updated