Drop in a recording, get a clean editable transcript, then download it as TXT, Markdown, SRT, VTT, or Word.
Supported formats: .mp3, .wav, .m4a, .ogg, .webm, .flac, .aac, .opus. Maximum file size: 200 MB. Long recordings work but transcription will take longer.
Pick a model, then click Transcribe. The model is downloaded the first time you use it (then cached for offline use), so the first run is slower than subsequent ones.
Click a timestamp to seek the audio. Edit any line directly. Use the × to drop a line.
Pick a format. Plain text and Word strip timestamps; SRT, VTT, and Markdown keep them.
Speech recognition by
transformers.js (Apache 2.0)
running OpenAI’s Whisper (MIT) models.
Word export via JSZip (MIT).
Open-source acknowledgements
Audio Transcriber turns a recording into clean, editable text — and then exports it as plain TXT, Markdown with timestamps, Word, SRT, VTT, or raw JSON. Drop in an audio file, pick a model, transcribe, edit any lines that need fixing, and download. Click any timestamp to jump the audio player to that exact moment.
Everything happens inside your browser using OpenAI's Whisper models running through transformers.js. Your audio never leaves your device. The first run downloads the open-source speech model (40–250 MB depending on your choice), caches it permanently in the browser, and works fully offline from then on — including for any future recordings.
No accounts, no upload queue, no per-minute charges — drop a file, pick a model, edit the result, and export. Everything stays on your device.