Auto Subtitle Generator: Add Captions to Video Free with Whisper

About the Subtitle Studio

Subtitle Studio drops a video in, auto-transcribes the speech, lets you fine-tune the captions on a timeline, and exports them as SRT, VTT, or burned right into a new MP4. Style fonts, colors, outlines, and positioning live — what you see in the preview is what gets baked into the final file.

Everything happens inside your browser, including the AI transcription. Your video never leaves your device. The first run downloads a generic, open-source speech model (~80 MB) from Hugging Face over HTTPS, caches it permanently, and runs fully offline from then on — even the model file is the same one used for every voice, not anything tied to your audio.

What you can do

Drop & auto-transcribe: MP4, WebM, MOV — drop a video and the built-in speech model generates timed cues automatically.
Cue list editor: Edit, retime, or delete any individual cue. Add a new cue at the current playhead with one click.
Shift all cues: Bump every caption forward or backward by 0.5s — useful when transcription is off by a consistent offset.
Live preview overlay: Captions render on top of the video as you scrub, with the exact styling you've chosen.
Caption styling: Font (Inter, Montserrat, Helvetica, Georgia, Courier), size as % of video height, text color, outline color & width, background fill (none, black at 50%/75%/solid), and position (bottom, middle, top).
Export SRT or VTT: Save the captions as a sidecar file for YouTube, Vimeo, Premiere, Final Cut, or any video player that loads external subtitles.
Burn-in to MP4: Re-encode the video with the captions baked into the pixels — perfect for social platforms that don't render external caption files.
Fullscreen preview: Toggle fullscreen on the preview to check legibility on a bigger screen before exporting.

Common uses

Adding captions to Instagram Reels, TikToks, YouTube Shorts, and other social videos where most viewers watch on mute.
Making podcast episodes, interviews, and tutorials accessible.
Generating SRT files for delivery to clients or distribution platforms.
Localizing video content by editing the auto-transcribed text or replacing it with a translation.
Adding stylized "title-card" captions to vlogs and product videos.
Quickly transcribing a long meeting, lecture, or talk recording into editable text.

Tips for the best results

Use the cleanest audio you can. The model handles background noise, but speech that's clearly above the noise floor transcribes far more accurately.
For social videos shot vertically, keep caption size around 4–6% and use a black-50% or black-75% background so the text stays legible over busy footage.
White text with a 2–4px black outline is the safest default — readable on any background.
Use the "+0.5s all" / "−0.5s all" buttons if the captions feel uniformly late or early after transcription.
Burn-in is the right choice for Instagram, TikTok, and any platform where the player won't load an external SRT. Use SRT/VTT export when uploading to YouTube or delivering to an editor.
Keep the tab active during burn-in. Browsers throttle background tabs, which slows the encoder significantly.
The model download only happens once. Future visits are fully offline — re-transcribing a second video skips straight to processing.

No accounts, no upload queue, no transcription credits — drop a video, edit the captions, export or burn-in. The output is yours to share, post, or hand off to an editor.

Common questions

How do I add subtitles to a video for free?

Load the video and Whisper transcribes it into timed captions right in your browser. Tidy any misheard words, style the text, and either export SRT/VTT files or burn the subtitles into a new MP4.

Should I export SRT files or burn in the captions?

SRT/VTT files suit platforms that accept caption uploads (viewers can toggle them); burned-in captions work everywhere, including platforms and players with no caption support. The studio does both from the same edit.

How accurate are the auto-generated captions?

Whisper is one of the strongest speech models available and handles accents and imperfect audio well. The editor exists for the rest: click a cue, fix a word, done.

Why do videos need captions now?

Most social video plays on mute, and captions are also an accessibility baseline. Captioned videos simply get watched more.

Is my video uploaded for transcription?

No. The Whisper model downloads to your device once and the transcription runs locally, so client videos and internal recordings stay private, with no per-minute transcription bill.

Subtitle Studio

Caption styling (preview & burn-in)