Drop a video — auto-transcribe, edit captions on a timeline, export SRT/VTT, or burn them right into the file.
One-time model download (~80 MB). To keep the site lean, the AI speech model is fetched on first use from Hugging Face (the open-source model registry), cached by your browser, and never downloaded again.
Your video and audio stay on your device — only the generic speech model file (used for any voice, not yours) is transferred, over HTTPS. Every subsequent run is fully offline.
Drop a video here
or
MP4, WebM, MOV · up to ~30 min works best
Encoding runs entirely in your browser. Keep this tab active — switching tabs can slow the encoder down significantly.
Subtitle Studio drops a video in, auto-transcribes the speech, lets you fine-tune the captions on a timeline, and exports them as SRT, VTT, or burned right into a new MP4. Style fonts, colors, outlines, and positioning live — what you see in the preview is what gets baked into the final file.
Everything happens inside your browser, including the AI transcription. Your video never leaves your device. The first run downloads a generic, open-source speech model (~80 MB) from Hugging Face over HTTPS, caches it permanently, and runs fully offline from then on — even the model file is the same one used for every voice, not anything tied to your audio.
No accounts, no upload queue, no transcription credits — drop a video, edit the captions, export or burn-in. The output is yours to share, post, or hand off to an editor.