Drop a video — auto-transcribe, edit captions on a timeline, export SRT/VTT, or burn them right into the file.
One-time model download (~80 MB). To keep the site lean, the AI speech model is fetched on first use from Hugging Face (the open-source model registry), cached by your browser, and never downloaded again.
Your video and audio stay on your device — only the generic speech model file (used for any voice, not yours) is transferred, over HTTPS. Every subsequent run is fully offline.
Drop a video here
or
MP4, WebM, MOV · up to ~30 min works best
Encoding runs entirely in your browser. Keep this tab active — switching tabs can slow the encoder down significantly.