Benchmark

Transcription Accuracy: How Thoth Models Compare

Studio benchmarks are measured on clean, native-English audio. Real meetings have accented speakers, background noise, and domain vocabulary. We tested on all three. The numbers below are from actual recordings, not marketing copy.

Re-transcription · 42-min recording, M2 MacBook Pro

Model Speed WER · accented EN WER · clean EN Languages
Whisper Large V3 Turbo 12.7× 32.3% 7.8% 99
Parakeet TDT V3 Pro 180× 38.9% 8.7% 25
Whisper Small 17.7× 45.9% 9.2% 99
Whisper Base 59.6× 51.1% 9.2% 99

Live transcription · French-accented English

Engine WER Latency
Parakeet EOU 120M Pro 38.4% ~160 ms
Parakeet Sliding Window Pro 56.8% ~11 s
WhisperKit Base+Small 65.9% ~12 s
7.72 s to diarize 42 min of audio with 2 speakers Up to 8 speakers · Fully on-device · Pyannote CoreML

Large V3 Turbo wins on accuracy. Best on all three scripts: 32.3% on French-accented English, 7.8% on clean audio. If the transcript needs to be right, this is the one.

Parakeet is 14x faster on the same file. Near-identical WER on clean speech (8.7% vs 7.8%). Falls behind on accented speech and code-switching. Worth it when speed matters and audio is clean.

Parakeet EOU is a different category. Word-by-word output at ~160 ms latency. Comparing its 38.4% WER to batch models isn't fair: it's a streaming engine optimised for real-time, not accuracy.

Published benchmarks are optimistic. Every model ran 10-30x worse on accented or foreign-language speech than studio numbers suggest. Real meetings are harder than LibriSpeech.

Full methodology: three test scripts (French-accented English, native French with code-switching, clean studio audio), WER computed via edit distance, hardware details, and model notes including the Whisper Medium silent-skip issue. Read the full benchmark writeup →

All models run on your Mac.

Pick the model that fits your use case. No cloud, no account, no data leaving your device.

Download on the Mac App Store