API

Pricing

Workflows

API

Pricing

Whisper Speech-to-Text and SRT Subtitle Generator

Upload any audio file and Whisper transcribes it into text with word-level and segment-level SRT subtitle files. Auto language detection included.

audio

speech to text

srt

STT

subtitles

transcription

whisper

265

AUDIO CONVERSION - AUDIO OUTPUT_1776911574203.png

Generates in about -- secs

floyoofficial

Nodes & Models

ComfyUI Official

LoadAudio

WorkflowGraphics

Apply Whisper

PreviewAny

Save SRT

Description:

Turn any audio file into a text transcription and ready-to-use SRT subtitle files.

Upload your audio, and Whisper (large model) transcribes it with automatic language detection. You get three outputs: a full text transcription, an SRT file with segment-level timestamps, and an SRT file with word-level timestamps. No config needed to get started.

How do you generate SRT subtitles from audio using Whisper?

Upload an audio file, pick your Whisper model size and language setting, then hit Run. The workflow transcribes your audio and exports two SRT files: one split by sentence segments, one split by individual words. Language detection is automatic, but you can lock it to a specific language if you need to.

Audio File Your input. Upload any audio format (MP3, WAV, etc.). Longer files take longer to process, but there's no hard limit on duration.

Model Controls transcription accuracy and speed. Default is "large," which gives the best accuracy. Smaller models (medium, small, base, tiny) run faster but miss more words. Stick with large unless speed matters more than accuracy.

Language Set to "auto" by default, which detects the spoken language for you. If your audio has a strong accent or mixes languages, locking this to a specific language can improve results.

Prompt Optional. Feed Whisper a hint about what the audio contains. Useful for uncommon names, technical terms, or acronyms that Whisper might mishear. Leave it blank for most jobs.

SRT Output Mode This workflow gives you both options at once. The "segments" SRT groups text into sentence-like chunks. The "words" SRT timestamps each individual word. Use segments for standard subtitles. Use words for karaoke-style effects or precise audio syncing.

What is Whisper speech-to-text good for?

Whisper handles most transcription jobs well: podcast episodes, voiceovers, interviews, meeting recordings, and video dialogue. It works across dozens of languages and outputs clean SRT files you can drop into any video editor or subtitle tool.

Subtitling video content is the most common use case. Export the segment SRT and import it into Premiere, DaVinci, CapCut, or any editor that reads SRT files. Word-level SRT is useful for motion graphics where text needs to appear one word at a time.

Transcription for written content works too. Pull the full text output and use it as a draft for blog posts, show notes, or documentation. The text won't be perfectly punctuated, but it gives you a solid starting point.

For heavily accented speech, noisy backgrounds, or overlapping speakers, accuracy drops. Whisper works best with clear, single-speaker audio. If your audio is messy, try setting the language manually and adding a prompt hint with key terms.

FAQ

What Whisper model size should I use for transcription? Large gives the best accuracy and is the default. If your audio is clean and you want faster results, medium is a reasonable tradeoff. For quick drafts where missing a few words doesn't matter, small or base will do.

Can Whisper transcribe non-English audio? Yes. Whisper supports dozens of languages and detects them automatically. For best results with non-English audio, set the language input to the specific language instead of "auto."

What's the difference between segment and word SRT output? Segments group the transcription into sentence-length chunks with timestamps. Words timestamp each individual word. Use segments for standard subtitles. Use words when you need precise per-word timing for effects or syncing.

How do I use the SRT file in my video editor? Download the SRT file from the output, then import it into your editor. Most editors (Premiere Pro, DaVinci Resolve, Final Cut, CapCut) have a subtitle import option that reads SRT files directly.

How do I run Whisper speech-to-text online? You can run Whisper speech-to-text online through Floyo. No installation, no setup. Open the workflow in your browser, upload your audio, and hit run. Free to try.

Discover more workflows

You might like these too.

floyoofficial

111

asr

audio

qwen

speech to text

subtitles

transcription

Upload audio and Qwen3's ASR engine returns the transcript, word-level timing for SRT subtitles, and an optional translation to English. Language auto-detected.

Qwen3 ASR: Transcribe Audio

Upload audio and Qwen3's ASR engine returns the transcript, word-level timing for SRT subtitles, and an optional translation to English. Language auto-detected.

Voice Changer using TTS Audio Suite (ChatterBox)

floyoofficial

774

audio

Audio2Audio

Chatterbox

tts

TTS Audio Suite

voice conversion

Convert any voice to match a target speaker using ChatterBox TTS. Upload source and narrator audio, run it, get back a converted MP3. No voice training needed.

Voice Changer using TTS Audio Suite (ChatterBox)

Convert any voice to match a target speaker using ChatterBox TTS. Upload source and narrator audio, run it, get back a converted MP3. No voice training needed.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)