Speech-to-Text (STT) Node

Speech-to-Text (STT) Node

Overview

The Speech-to-Text Node enables your flow to convert audio into text using advanced AI models. This node supports OpenAI, Azure OpenAI, and Groq providers, making it ideal for applications requiring audio transcription, voice command processing, or content accessibility features.

Usage cost: 2 credits

Configuration Settings

  1. Model Selection

    • Model*: Select from available OpenAI, Azure OpenAI, or Groq STT models

    • Note: Only OpenAI, Azure OpenAI, and Groq providers are currently supported

  2. Audio Input

    • Audio File*: Select an audio input from available audio variables

    • Supported Formats:

      • MP3 (.mp3)

      • MP4 (.mp4)

      • MPEG (.mpeg, .mpga)

      • M4A (.m4a)

      • WAV (.wav)

      • WebM (.webm)

      • FLAC (.flac)

      • OGG (.ogg, .oga)

  3. Transcription Settings

    • Language: Optional language specification

      • Auto-detect (default)

      • 60+ supported languages including English, Spanish, French, German, etc.

    • Temperature: Controls transcription variability (0.0 to 1.0)

      • Lower values (0.0): More focused and deterministic

      • Higher values (1.0): More variety in word choice

Outputs

  • text (string): Transcribed text from the audio input

Best Practices

  1. Audio Preparation

    • Use clear, high-quality audio recordings

    • Minimize background noise and interference

    • Ensure proper audio format compatibility

    • Keep file sizes reasonable for processing

  2. Language Selection

    • Use auto-detect for general transcription

    • Specify language for:

      • Improved accuracy with known language

      • Regional accent consideration

    • Consider target audience when selecting language

  3. Temperature Optimization

    • Use 0.0 for:

      • Technical content

      • Legal documents

      • Precise transcriptions

    • Use higher values for:

      • Creative content

      • Casual conversations

      • Multiple interpretation scenarios

  4. Format Selection

    • Use MP3 for:

      • General-purpose transcription

      • Balanced quality and file size

      • Wide compatibility

    • Use WAV or FLAC for:

      • High-fidelity requirements

      • Professional audio

      • Critical accuracy needs

Common Issues

  1. File Format Issues

    • Unsupported audio formats

    • Corrupted audio files

    • Invalid file extensions

  2. Transcription Quality

    • Poor audio quality affecting accuracy

    • Background noise interference

    • Multiple speakers overlap

    • Heavy accents or dialects

  3. API Limitations

    • File size restrictions

    • Rate limiting

Last updated