API

Pricing

Workflows

API

Pricing

Audio Separation for Video to Audio

Upload a video, strip the audio, and split it into four clean stems (Bass, Drums, Other, and Vocals), then save your chosen stem as an MP3. No model required.

Audio Separation

Video to Audio

200

AUDIO CAPTURE - RECORDING_1776911880838.png

Generates in about 42 secs

floyoofficial

Nodes & Models

ComfyUI-VideoHelperSuite

VHS_LoadVideo

ComfyUI_StarNodes

VHS_LoadVideo

ComfyUI-S3-IO

VHS_LoadVideo

ComfyUI Official

WorkflowGraphics

SaveAudioMP3

ComfyUI-AudioSuiteAdvanced

AudioSeparation

audio-separation-nodes-comfyui

AudioSeparation

Description:

Split the audio from any video into four clean stems: vocals, drums, bass, and everything else.

Upload a video file, and the AudioSeparation node breaks its soundtrack into four isolated tracks. Pick the stem you need, save it as a high-quality MP3. The default setup exports the vocal track at 320 kbps.

One input. One click. You get a clean isolated stem out the other end.

How do you separate audio from a video in ComfyUI?

Upload your video to the Load Video node. The workflow extracts the audio, runs it through stem separation, and splits it into four outputs: vocals, drums, bass, and other. Connect whichever stem you need to the Save Audio MP3 node and hit Run.

Video Upload Drop your video file into the Load Video input. MP4 works. The node pulls the audio track automatically. You don't need to extract audio first or convert anything.

Window Type Controls how the separation algorithm handles audio segments. Default is "half_sine." Leave it unless you're hearing artifacts at segment boundaries, then try other windowing options to smooth transitions.

Segment Size How many seconds the algorithm processes at a time. Default is 20. Larger values use more memory but can improve separation quality on longer tracks. If you're running into memory issues, lower this number.

Overlap How much neighboring segments share at their edges. Default is 0.2 (20%). Higher overlap means smoother transitions between segments, but slower processing. For most videos, 0.2 works fine.

Choosing Your Stem The separation outputs four tracks: Bass, Drums, Other, and Vocals. By default, the Vocals output is wired to the save node. Want the drum track instead? Reconnect the Drums output to SaveAudioMP3 in the workflow editor.

Bitrate Saves at 320k by default. That's the highest standard MP3 bitrate. Good enough for production use.

What is audio separation from video good for?

This workflow is for anyone who needs a clean isolated audio track from a video file. It handles vocal extraction, instrumental isolation, and stem splitting without leaving ComfyUI or installing separate tools.

You shot a video and need the dialogue without the background music. Drop it in, grab the vocal stem. Working on a remix and need the drums from a reference clip. Same workflow, different output.

It's also useful for content creators pulling stems from screen recordings, interview footage, or any video where the audio layers need to come apart.

The separation model handles most material well, but dense mixes with lots of overlapping frequencies will always be harder to split cleanly. If you need surgical precision on a complex mix, a dedicated DAW with manual editing will get you further.

FAQ

How do I extract vocals from a video using ComfyUI? Upload your video to the Audio Separation for Video workflow. The AudioSeparation node splits the soundtrack into four stems. The Vocals output gives you an isolated vocal track saved as a 320k MP3.

Can I save more than one stem at a time? The default setup saves one stem. To export multiple stems, duplicate the SaveAudioMP3 node and connect each one to a different output (Bass, Drums, Other, Vocals). Each saves as a separate MP3 file.

What video formats work with this workflow? The Load Video node supports MP4 and other common video formats. If your file has an audio track, the node will extract it.

Does audio separation work on long videos? Yes, but longer videos use more memory. If you hit limits, reduce the Segment Size value. Processing time scales with video length.

How do I run audio separation from video online? You can run audio separation from video online through Floyo. No installation, no setup. Open the workflow in your browser, upload your video, and hit run. Free to try.

Discover more workflows

You might like these too.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Wan 2.1 FusionX: Cinematic Image to Video

floyoofficial

4.6k

FusionX

Image to Video

Video Generation

Wan

Created by @vrgamedevgirl on Civitai, please support the original creator!

Wan 2.1 FusionX: Cinematic Image to Video

Created by @vrgamedevgirl on Civitai, please support the original creator!

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap