# Speech-to-Text (STT) Node

## Speech-to-Text (STT) Node

### Overview

The Speech-to-Text Node enables your flow to convert audio into text using advanced AI models. This node supports **OpenAI**, **Azure OpenAI**, and **Groq** providers, making it ideal for applications requiring audio transcription, voice command processing, or content accessibility features.

Usage cost: 2 credits

### Configuration Settings

1. **Model Selection**
   * Model\*: Select from available OpenAI, Azure OpenAI, or Groq STT models
   * Note: Only OpenAI, Azure OpenAI, and Groq providers are currently supported
2. **Audio Input**
   * Audio File\*: Select an audio input from available audio variables
   * Supported Formats:
     * MP3 (.mp3)
     * MP4 (.mp4)
     * MPEG (.mpeg, .mpga)
     * M4A (.m4a)
     * WAV (.wav)
     * WebM (.webm)
     * FLAC (.flac)
     * OGG (.ogg, .oga)
3. **Transcription Settings**
   * Language: Optional language specification
     * Auto-detect (default)
     * 60+ supported languages including English, Spanish, French, German, etc.
   * Temperature: Controls transcription variability (0.0 to 1.0)
     * Lower values (0.0): More focused and deterministic
     * Higher values (1.0): More variety in word choice

### Outputs

* `text` (string): Transcribed text from the audio input

### Best Practices

1. **Audio Preparation**
   * Use clear, high-quality audio recordings
   * Minimize background noise and interference
   * Ensure proper audio format compatibility
   * Keep file sizes reasonable for processing
2. **Language Selection**
   * Use auto-detect for general transcription
   * Specify language for:
     * Improved accuracy with known language
     * Regional accent consideration
   * Consider target audience when selecting language
3. **Temperature Optimization**
   * Use 0.0 for:
     * Technical content
     * Legal documents
     * Precise transcriptions
   * Use higher values for:
     * Creative content
     * Casual conversations
     * Multiple interpretation scenarios
4. **Format Selection**
   * Use MP3 for:
     * General-purpose transcription
     * Balanced quality and file size
     * Wide compatibility
   * Use WAV or FLAC for:
     * High-fidelity requirements
     * Professional audio
     * Critical accuracy needs

### Common Issues

1. **File Format Issues**
   * Unsupported audio formats
   * Corrupted audio files
   * Invalid file extensions
2. **Transcription Quality**
   * Poor audio quality affecting accuracy
   * Background noise interference
   * Multiple speakers overlap
   * Heavy accents or dialects
3. **API Limitations**
   * File size restrictions
   * Rate limiting


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.waterflai.ai/studio/studio-builders/flow-components-nodes/speech-to-text-stt-node.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
