# Thoth: Private Meeting Recorder for Mac

> Record both sides of any meeting. Transcribe and summarize locally on your Mac. No cloud, no data ever leaving your machine.

Homepage: https://thoth-app.com/
Mac App Store: https://apps.apple.com/app/thoth-your-private-ai-scribe/id6756965313?mt=12
Developer: Matthieu Veinhard (solo, no funding, no team)
Version: 1.4
Requires: macOS Tahoe (macOS 26) or later. Apple Silicon recommended. Intel Macs supported for recording and transcription.

---

## What Thoth Does

Thoth is a native SwiftUI Mac app that records meetings privately. It captures your microphone and the system audio (Zoom, Teams, Meet, or any app) on two separate channels simultaneously, transcribes everything on-device using WhisperKit and CoreML, detects individual speakers, and generates AI summaries. Nothing leaves your machine unless you explicitly choose to use a cloud AI key for summaries.

Key distinction from cloud competitors: Thoth does not join your meetings as a bot, does not upload your audio to any server, and does not require an account or internet connection.

---

## Features

### Recording
Captures microphone and system audio independently. No virtual audio drivers, no screen capture, no bot joining the call. Mic channel is always Speaker 1 (you). System audio captures all remote participants. Both channels are recorded, transcribed, and diarized separately, making speaker attribution deterministic rather than estimated.

### Live Transcript
A floating panel displays your transcript word by word as you speak. Two models run in parallel: a fast streaming model (Parakeet EOU, ~160 ms latency) for instant preview, and a quality batch model for the final text. All on-device.

### Speaker Detection (Diarization)
An on-device engine using Pyannote CoreML detects individual speakers and color-codes the transcript automatically. Benchmarked at 7.72 seconds to diarize a 42-minute recording with 2 speakers. Supports up to 8 speakers. Works offline.

### AI Summary
Extracts notes, action items, and key decisions in one click. Choice of five local on-device models or your own cloud API key (OpenAI, Anthropic, Google). Audio never leaves your machine. If using cloud AI, only the text transcript is sent directly from your Mac to your chosen provider with your own key.

### On-Device AI Models
Five local models available, ranging from Ministral 3B (fast, small) to Gemma 3 12B (high quality, larger). No API key required, no internet needed.

### Export Formats
Free: WAV audio, TXT transcript.
Pro: M4A, AAC, Markdown, RTF, JSON (with timestamps and speaker colors), PDF. Direct sharing via Mail, Messages, AirDrop.

### Language Support
Whisper models support 99 languages with auto-detection. Parakeet TDT v3 supports 25 European languages and Japanese/Chinese for re-transcription (batch mode). Note: Parakeet has no language conditioning mechanism and will translate non-English audio to English on ambiguous speech (see blog post on language drift).

### Menu Bar Mode
Thoth can run as a compact always-on-top menu bar item. One-click recording start/stop without switching windows.

---

## Transcription Models and Benchmarks

All benchmarks run on an Apple M2 MacBook Pro, 42-minute recording.

### Re-transcription (batch)

| Model | Speed | WER (accented EN) | WER (clean EN) | Languages |
|---|---|---|---|---|
| Whisper Large V3 Turbo | 12.7x real-time | 32.3% | 7.8% | 99 |
| Parakeet TDT V3 (Pro) | 180x real-time | 38.9% | 8.7% | 25 |
| Whisper Small | 17.7x real-time | 45.9% | 9.2% | 99 |
| Whisper Base | 59.6x real-time | 51.1% | 9.2% | 99 |

Whisper Large V3 Turbo wins on accuracy across all scripts. Parakeet is 14x faster than Whisper Large V3 Turbo on the same file with near-identical WER on clean speech, but falls behind on accented speech.

### Live transcription

| Engine | WER | Latency |
|---|---|---|
| Parakeet EOU 120M (Pro) | 38.4% | ~160 ms |
| Parakeet Sliding Window (Pro) | 56.8% | ~11 s |
| WhisperKit Base+Small | 65.9% | ~12 s |

### Diarization
7.72 seconds to diarize 42 minutes of audio with 2 speakers. Up to 8 speakers. Fully on-device using Pyannote CoreML.

### AI Summary Quality (local vs cloud)

Tested on a real French-language interview transcript. Scored by Claude Opus across 6 criteria.

| Criterion | Local (Qwen 7B) | Cloud (Claude Sonnet BYOK) |
|---|---|---|
| Factual accuracy | 7/10 | 9.5/10 |
| Completeness | 5/10 | 9/10 |
| Decision capture | 2/10 | 8.5/10 |
| Action items | 5/10 | 8/10 |
| Quote selection | 4/10 | 8.5/10 |
| Language quality | 7/10 | 9/10 |
| Overall | ~5/10 | ~8.7/10 |
| Privacy | Zero data leaves | Text sent to provider |
| Cost | Free | ~$0.01/hour |
| Internet required | No | Yes |

---

## Privacy Architecture

- Audio transcription: WhisperKit + CoreML, runs entirely on-device. Audio never leaves the Mac.
- Speaker detection: Pyannote CoreML, fully on-device.
- On-device AI summaries: local models (Ministral 3B, Gemma 3 12B, and others), no internet needed.
- Cloud AI (optional, BYOK): only the text transcript is sent, directly from the user's Mac to their chosen provider (OpenAI, Anthropic, Google) using the user's own API key. Thoth never receives or stores the key or the transcript.
- API keys stored in Apple Keychain, never in plaintext.
- No account required. No data collection by Thoth.

---

## Pricing

### Free
- $0
- Unlimited recordings
- 30 min per mic recording, 15 min per system audio recording
- 10 AI enhancements per month (local or cloud)
- WAV audio export, TXT transcript export

### Pro
- $9.99/month
- $49.99/year
- $99.99 lifetime (one-time purchase, also available at EUR 99.99)
- Unlimited recording duration
- System audio and mixed recording
- All export formats: M4A, AAC, Markdown, RTF, JSON, PDF
- Unlimited AI enhancements (local or cloud with own API key)
- Large transcription model (Whisper Large V3 Turbo)
- Parakeet TDT v3 and Parakeet EOU models
- Remove branding from exports and shares

Free trial included with Pro subscription.

Note: Cloud AI features (OpenAI, Anthropic, Google) may not be available in all countries due to local regulations. On-device AI is available everywhere.

---

## Comparison with Cloud Competitors

|  | Thoth | Otter | Fireflies | Granola |
|---|---|---|---|---|
| Audio stays on your Mac | Yes | No | No | No |
| No bot joins your call | Yes | No | No | Yes |
| Works fully offline | Yes | No | No | No |
| Dual-channel recording | Yes | No | No | No |
| On-device AI summaries | Yes | No | No | No |
| Native Mac app | Yes | No | No | Yes |

Cloud recorders (Otter, Fireflies) upload audio to US servers. This creates legal exposure for everyone on the call, including participants who did not consent to cloud storage. Meeting bots (Otter, Fireflies) are visible to all participants and require permission from the meeting host.

Granola is native to Mac and does not use a bot, but it does upload audio for transcription. It does not offer dual-channel recording or on-device transcription.

---

## Use Cases

### Lawyers
Client meetings, depositions, and internal strategy sessions involve privileged and confidential information. Cloud recorders create legal risk by uploading audio to third-party servers. Thoth keeps everything on the lawyer's machine, with no third party ever handling the audio.

### Journalists
Source interviews often involve sensitive information, off-the-record conversations, and whistleblowers. Uploading these to cloud services violates source protection obligations. Thoth provides a local record with no upload risk.

### Doctors
Patient consultations contain protected health information (PHI). Cloud recording tools used without patient consent and BAA agreements create HIPAA liability. Thoth never transmits audio, eliminating cloud compliance concerns for clinical notes.

### Researchers
Qualitative research interviews generate transcripts that may include sensitive participant disclosures. Local transcription removes the data governance burden of cloud processing.

---

## Blog

Full blog: https://thoth-app.com/blog/

### Parakeet translates French audio (May 19, 2026)
URL: https://thoth-app.com/blog/2026-05-19-parakeet-language-drift/

Parakeet TDT v3 was added to Thoth as the fastest transcription model. On clean English audio it nearly matches Whisper Large V3 Turbo while completing 14x faster. On non-English audio, it does something unexpected: it translates rather than transcribes. The model has no language conditioning tokens (unlike Whisper's language prefix tokens). When acoustic signal becomes ambiguous (spontaneous speech, regional vocabulary, argot), Parakeet falls back to the language that dominated its training data and outputs English. This is not random noise. It is word-for-word involuntary translation.

Measured English-language intrusion rates on French recordings:
- Children's weather segment (scripted, clear): 0%
- Archival 1912 recording (deliberate speech): 0%
- Weightlifting documentary 1975: 7.1%
- French slang documentary 1981: 18.2%
- Picard dialect documentary 1982: 16.7%
- Private French-language interview (spontaneous): 31.3%

Whisper produced zero English output across all six recordings.

Thoth shows a warning before re-transcription with Parakeet when a non-English language is selected. Parakeet remains available and unblocked. For non-English audio, Whisper Large V3 Turbo is the recommended model.

### Your meeting audio belongs on your Mac (May 13, 2026)
URL: https://thoth-app.com/blog/2026-05-13-why-your-meeting-recorder-shouldnt-upload-your-audio/

Cloud meeting recorders upload audio to US servers. Under GDPR (EU), PIPEDA (Canada), and similar frameworks, recording a conversation and transmitting it to a third-party cloud service may require explicit consent from all participants. In practice, the other people on the call rarely know their audio is being uploaded. Thoth eliminates this exposure by keeping everything local.

### Local vs cloud AI summaries (May 13, 2026)
URL: https://thoth-app.com/blog/2026-05-13-local-vs-cloud-ai-summaries/

Detailed comparison of on-device model quality (Qwen 7B) versus cloud AI (Claude Sonnet via BYOK) for meeting summarization. Local models score ~5/10 overall; Claude Sonnet scores ~8.7/10. The gap is largest on decision capture (2/10 vs 8.5/10) and completeness (5/10 vs 9/10). Privacy guarantee is identical for both options: audio never leaves the machine regardless of which AI is used for summaries.

### How I benchmark transcription models (May 10, 2026)
URL: https://thoth-app.com/blog/2026-05-10-how-we-benchmark-transcription/

Methodology behind the WER benchmarks published on the site. Three test scripts: clean English, French-accented English, and a French-to-English code-switching script. Ground truth created manually. All models tested on the same 42-minute recording on an M2 MacBook Pro. Published benchmarks (LibriSpeech WER) are consistently 10-30x more optimistic than real-world meeting audio results.

### Menu bar mode (May 18, 2026)
URL: https://thoth-app.com/blog/2026-05-18-menu-bar-mode/

How Thoth's menu bar mode works. The app runs as a compact always-on-top item in the menu bar, enabling one-click recording without switching windows or keeping the main app visible. Designed for users who record frequently and want the smallest possible interruption to their workflow.

### iOS exploration (May 18, 2026)
URL: https://thoth-app.com/blog/2026-05-18-thoth-ios-exploration/

Notes on exploring a Thoth port to iPhone and iPad. On-device transcription via WhisperKit is feasible. System audio capture (recording both sides of a call) is not possible on iOS due to platform restrictions. The post covers what would and would not work, and what a limited iOS version might look like.

---

## Technical Stack

- Language: Swift, SwiftUI
- Transcription: WhisperKit (Whisper models via CoreML), FluidAudio (Parakeet TDT v3 and Parakeet EOU via CoreML)
- Speaker diarization: Pyannote CoreML
- On-device AI summaries: MLX-based local models (Ministral 3B, Gemma 3 12B, and others)
- Audio capture: macOS AVAudioEngine, ScreenCaptureKit (system audio)
- Platform: macOS 26 (Tahoe) and later
- Distribution: Mac App Store

---

## FAQ

**Does Thoth record both sides of a Zoom or Teams meeting?**
Yes. Thoth captures your microphone on one channel and the meeting app's system audio on a separate channel, so every participant is recorded without joining a bot to your call.

**Is Thoth free to use?**
Thoth is free to try with full transcription, up to 30 minutes per mic recording. Pro unlocks unlimited duration, system audio recording, and unlimited AI enhancements.

**Does my audio get sent to the cloud?**
No. Transcription runs entirely on your Mac using WhisperKit and CoreML. Your audio never leaves your device. If you use a cloud AI key for summaries, only the text transcript is sent directly from your Mac to your chosen provider.

**Does Thoth work offline?**
Yes. Recording and transcription work with no internet connection. On-device AI summaries also work offline. An internet connection is only needed if you choose to use a cloud AI key.

**What Mac models does Thoth support?**
Thoth requires macOS Tahoe (macOS 26) or later. Apple Silicon is recommended for on-device AI summaries. Intel Macs are supported for recording and transcription.

**What is the difference between Thoth and Granola?**
Both are native Mac apps that do not join your meeting as a bot. The key differences: Granola uploads audio to the cloud for transcription; Thoth transcribes entirely on-device. Thoth records both your mic and system audio on separate channels; Granola records mic only and uses your notes as a supplement. Thoth works fully offline; Granola requires internet for transcription.

**What is the difference between Thoth and Otter or Fireflies?**
Otter and Fireflies both upload your audio to cloud servers and join meetings as visible bots. Thoth does neither. Thoth transcribes on-device with no upload and no bot. Thoth also works offline; Otter and Fireflies require internet for all functionality.