Overview
Destined Voice includes tools to evaluate Speech-to-Text (STT) providers. Test multiple providers against your audio and analyze accuracy with WER/CER metrics.Supported Providers
| Provider | Model | Description |
|---|---|---|
| Deepgram Nova-3 | deepgram-nova-3 | Latest Deepgram model |
| Deepgram Flux | deepgram-flux | Real-time optimized |
| AssemblyAI | assemblyai | High accuracy |
| OpenAI Whisper | openai | Multilingual |
| Google Speech | google | Google Cloud STT |
| Azure Speech | azure | Microsoft Azure |
| Amazon Transcribe | amazon | AWS Transcribe |
| Soniox | soniox | Low latency |
| Play.ht | playht | Specialized model |
Transcribing Audio
Send audio to multiple providers:Calculating Accuracy
Compare transcriptions against ground truth:Metrics Explained
Word Error Rate (WER)
Measures word-level accuracy:- Lower is better (0.0 = perfect, 1.0 = completely wrong)
- Industry standard for STT evaluation
Character Error Rate (CER)
Measures character-level accuracy:- More granular than WER
- Useful for detecting minor transcription errors
Demographic Bias Analysis
Analyze STT accuracy across demographics:Best Practices
Use consistent audio quality
Use consistent audio quality
Test with audio at 16kHz or higher. Lower quality affects all providers equally.
Normalize text before comparison
Normalize text before comparison
Remove punctuation, lowercase text, and expand numbers for fair WER calculation.
Test multiple accents
Test multiple accents
STT accuracy varies by accent. Test with speakers matching your user base.
Consider latency vs accuracy
Consider latency vs accuracy
Some providers trade accuracy for speed. Choose based on your use case.
Provider Comparison (Typical Performance)
| Provider | Avg WER | Avg Latency | Best For |
|---|---|---|---|
| Deepgram Nova-3 | 5-8% | 400ms | General use |
| AssemblyAI | 4-7% | 600ms | High accuracy |
| OpenAI Whisper | 5-10% | 800ms | Multilingual |
| 6-10% | 500ms | Integration | |
| Azure | 6-10% | 550ms | Enterprise |
Actual performance varies by audio quality, accent, and domain vocabulary.