Parakeet translates French audio
2026-05-19
Parakeet TDT v3 is the fastest transcription model in Thoth. On clean English audio it nearly matches Whisper Large V3 Turbo in accuracy while completing in a fraction of the time. When I added it, I expected it to be a strong option for everyone.
Then I noticed something on a French recording. Not noise. Not garbled output. Clean, grammatical English, in the middle of a French transcript.
I assumed drift. Then I looked more carefully.
It's not drift. It's translation.
A 1981 INA documentary about the French language. Both models received the same audio. Here is the same passage, side by side:
Whisper:
> "Alors, à différentes reprises dans l'histoire, on a des documents écrits jusqu'au 19ème siècle. Après le 19ème siècle, ça va devenir bien plus important. Mais jusque-là, ça reste le langage de la cour de miracles."
Parakeet:
> "At different representations in the history we have the documents just in 19e cycle. After the 19th century, it will be more important. But just later, it rests the language of the court of miracles."
(Archive INA, 1981)
Three consecutive sentences. Fully in English. The speaker never switched languages. But look at it closely: this isn't random English. It's a word-for-word translation of what was said.
The pattern holds across every non-English recording I tested:
| Whisper (transcription) | Parakeet (translation) |
|---|---|
| "Il mange le morceau" | "He manges the morceau" |
| "Et en mangeant le morceau" | "And in manging the morceau" |
| "ça m'est revenu très fort" | "and that's revenue très fort" |
| "C'est le message de terroirs qu'il souhaite rendre ainsi à notre compréhension" | "It's the message of terroirs that sought render ainsi to our compréhension" |
| "Aussi va-t-il régulièrement de chaumières en village pour recueillir les expressions" | "Also regularly de Chaumière en village to recueillir les expressions" |
| "À 87 ans... et s'il a de moins en moins" | "At 87 ans... and s'il a de moins en moins" |
| "Et bien évidemment le club civil" | "And the club civil" |
| "dans ce sport" | "in the sport" |
(Archive INA, 1975, 1981, 1982)
Parakeet is hearing the French, understanding it, and choosing to output English. Not corrupted audio. Not hallucination. Involuntary simultaneous translation.
Why this happens
Whisper uses explicit language conditioning. Before decoding each audio chunk, it prepends a language token: <|fr|> for French, <|en|> for English. The decoder is locked to that language from the first word.
Parakeet has no equivalent mechanism. It learns to transcribe purely from acoustic patterns, with no language identity signal. When the input is unambiguous standard French, it stays in French. When the speech gets harder to parse, whether from regional vocabulary, spontaneous pacing, or argot, the model falls back to the language that dominated its training data. It doesn't get confused and output noise. It gets confused and outputs a translation.
This isn't a bug in the integration. I checked FluidAudio's source. The language parameter in the Swift API controls a script filter (Latin versus Cyrillic). It does not condition the decoder. There is no way to force the model into a language at the inference level.
How bad does it get
I tested Parakeet and Whisper on six French recordings of increasing linguistic complexity and measured English-language intrusions automatically.
| Recording | Duration | Parakeet | Whisper |
|---|---|---|---|
| Children's weather segment (1980, INA) | 2:18 | 0% | 0% |
| Archival recording, 1912 (INA) | 3:46 | 0% | 0% |
| Weightlifting documentary (1975, INA) | 7:26 | 7.1% | 0% |
| French slang documentary (1981, INA) | 4:58 | 18.2% | 0% |
| Picard dialect documentary (1982, INA) | 3:38 | 16.7% | 0% |
| Private French-language interview | 40 min | 31.3% | 0% |
The 1912 archival recording is worth noting. The audio quality is genuinely poor by any modern standard, yet Parakeet stays in French throughout. The speaker is a craftsman recorded in a deliberate studio session, speaking clearly and intentionally. Audio degradation alone doesn't trigger the behavior. What matters is whether speech is scripted and clear, or spontaneous and unstructured.
Whisper produced zero real English output across all six recordings.
How Thoth handles it
Thoth shows a warning before re-transcription with Parakeet when you have a non-English language selected. The message is direct: Parakeet does not support language selection and may produce mixed-language output on non-English recordings. For best accuracy in your language, use a Whisper model instead.
The warning only fires when you explicitly select a non-English language. If you leave the setting on Auto, no warning appears: on English recordings, Parakeet performs well and the alert would just be noise.
Parakeet remains available and unblocked for non-English recordings. The warning is informational, not a lock.
The recommendation
For English transcription, Parakeet TDT v3 is the right choice: fast, accurate, and nearly on par with Whisper Large on clean audio.
For French and other non-English languages, use Whisper Large V3 Turbo. Every recording I tested confirms it. The speed gap is real. So is the difference between a transcript and an involuntary translation. The accuracy numbers are in the transcription benchmark.
Thoth is a private meeting recorder for Mac. All transcription runs on your device. Built by one person, no funding, no team. If you find it useful, upgrading to Pro is the best way to support development.