Auto Subtitles with Whisper - Video to Video
Upload a video and get it back with burned-in subtitles. Whisper transcribes the audio, then the text gets placed frame-by-frame with word-level timing.
subtitling
vid2vid
video generation
1
21
Nodes & Models
VHS_LoadVideo
VHS_VideoInfoLoaded
VHS_VideoCombine
VHS_LoadVideo
VHS_VideoInfoLoaded
VHS_VideoCombine
Apply Whisper
PreviewAny
Add Subtitles To Frames
Description:
Auto-generate subtitles for any video using OpenAI Whisper.
Upload your video, and the workflow extracts the audio, runs it through Whisper's large model for transcription, and burns the text onto your frames with word-level timing. You get back an MP4 with subtitles baked in. Language detection is automatic, so it works across languages without extra setup.
How do you auto-generate subtitles on a video with Whisper?
Upload a video, and Whisper transcribes the audio with word-level timing. The workflow burns those words onto each frame at the right moment, then exports an MP4 with subtitles baked in. No SRT files, no separate editor. One upload, one output.
Video Your source video. The workflow pulls both the frames and the audio from this file. FPS is detected from the original, so the output matches your input timing.
Whisper Model Controls transcription accuracy. Default is "large," which gives you the best results for most languages. Smaller models (medium, small, base, tiny) run faster but miss more words, especially with accents or background noise. Stick with large unless speed matters more than accuracy.
Language Set to "auto" by default. Whisper detects the spoken language on its own. If auto-detection picks the wrong language, set it manually.
Font Color Default is white. You can enter any color name or hex code. If your video has bright backgrounds, try "black" or "yellow" for contrast.
Font Family Default is Roboto-Bold. Bold fonts read better as subtitles, especially at smaller sizes or on busy backgrounds.
Font Size Default is 40. For 1080p video, 36 to 48 works well. For 720p or lower, try 28 to 36. For 4K, go higher: 56 to 72.
Position (X, Y, Center) Center X is on by default, which keeps your subtitles horizontally centered. Y position defaults to 400. Want classic bottom-of-screen subtitles? Set Y closer to your video height (e.g., 600 for 720p, 900 for 1080p). Want centered captions for social media reels? Turn on Center Y.
What is Whisper auto-subtitling good for?
This workflow is for anyone who needs subtitles burned directly into a video file. Social media clips, short-form content, interview edits, explainer videos, foreign-language content. One upload, no manual transcription, no syncing in a separate editor.
Social media content is the obvious fit. Platforms like Instagram, TikTok, and YouTube Shorts reward captioned videos, and most viewers watch without sound. This gives you burned-in text without leaving your browser.
Interview and talking-head edits benefit too. Whisper handles conversational speech well, and word-level timing means individual words appear as they are spoken, not in big chunks.
For long-form content or videos where you need editable SRT files, this is not the right tool. The subtitles are burned into the pixels. There is no separate subtitle file to edit after the fact. If you need to fix a transcription error, you re-run the workflow.
Whisper's accuracy is strong but not perfect. Heavy background music, overlapping speakers, or thick accents can cause errors. For high-stakes content, review the preview text output before committing to the final render.
FAQ
What Whisper model size should I use for video subtitles? Use "large" for the best accuracy across languages. It handles accents, background noise, and fast speech better than smaller models. If your video is in clear English and you want faster processing, "medium" is a reasonable tradeoff. Smaller sizes (small, base, tiny) drop accuracy noticeably.
Can Whisper auto-detect the language in my video? Yes. The default setting is "auto," and Whisper identifies the spoken language from the audio. It supports dozens of languages. If it picks the wrong one, you can override it manually in the language setting.
Can I change subtitle position for vertical video? Yes. Turn on Center X to keep text centered horizontally. Set the Y position to control vertical placement. For 9:16 vertical video, a Y value around 1600 to 1800 places subtitles near the bottom. For centered TikTok-style captions, turn on both Center X and Center Y.
Are the subtitles editable after running the workflow? No. This workflow burns subtitles into the video frames. The text becomes part of the image. If you need to edit the transcription, check the Preview Text output first, then re-run if needed.
How do I run Whisper subtitles on a video online? You can run Whisper auto-subtitles online through Floyo. No installation, no setup. Open the workflow in your browser, upload your video, and hit run. Free to try.
Read more


