API

Pricing

Workflows

API

Pricing

Voice Changer using TTS Audio Suite (ChatterBox)

Convert any voice to match a target speaker using ChatterBox TTS. Upload source and narrator audio, run it, get back a converted MP3. No voice training needed.

audio

Audio2Audio

Chatterbox

tts

TTS Audio Suite

voice conversion

771

Generates in about 17 secs

floyoofficial

Nodes & Models

ComfyUI Official

ChatterBoxEngineNode

LoadAudio

UnifiedVoiceChangerNode

SaveAudioMP3

Anomalous_Model_Browser

FloyoStickyNote

ShowText|pysssss

dspy_nodes

ShowText|pysssss

ComfyUI-Custom-Scripts

ShowText|pysssss

ABOUT THE WORKFLOW

Turn one voice into another Upload the recording you want to transform and a short sample of the voice you want it to sound like. The workflow keeps the words, timing, and delivery from your original audio and rebuilds them in the target voice. Only the speaker changes.

Model

ChatterBox by Resemble AI. An open source voice model that clones a target voice from a few seconds of reference audio and is strong at natural, expressive delivery.

HOW IT WORKS

Step 1. Upload the audio to convert The recording you want to transform. Its words, timing, and delivery stay the same. Only the voice changes. Works great with: narration · dialogue · voiceover

Step 2. Upload the target voice A short, clean sample of the voice you want the result to sound like. A few seconds of clear speech is enough.

Step 3. Hit run and download ChatterBox rebuilds the source speech in the target voice and returns an MP3. Preview it in the workflow, then download. Ready for: Premiere Pro · DaVinci Resolve · Audacity · any editor

First time? Leave every setting as-is. The defaults are the right starting point for almost everyone.

RECOMMENDED SETTINGS

Quick-start guide. Find the goal that matches yours and copy the settings.

Standard conversion (most people) — exaggeration 0.5, temperature 0.8, cfg weight 0.5, 1 refinement pass. The right starting point for almost everyone.
Want a more expressive read — raise exaggeration above 0.5 for more emotion and drama in the delivery.
Want a flatter, calmer read — lower exaggeration below 0.5 toward a steady, even tone.
Want the output closer to the target voice — lower the cfg weight so the result sticks tighter to your reference sample.
Want more variation between runs — raise the temperature. Lower it for a steadier, more repeatable result.
The voice identity is not holding — add a refinement pass so the model cleans up the conversion on a second look.
Converting a non-English clip — set the language to match your source audio so pronunciation lands right.

Target voice: Use a clean, dry sample with one speaker and little background noise. A clear five to ten second clip clones better than a long, noisy one. "A single speaker in a quiet room" beats "a clip with music under the voice."

LEARN

📹 Videos

Intro to Floyo
ComfyUI 101 Free Course ft. Sebastian Kamph
Floyo 101 for Team Collaboration

✨ Quick links

USE CASES

🎬 Filmmakers & Video Editors Re-voice a scratch track or a temp read in a consistent character voice without recalling the actor.

🎙️ Podcasters & Narrators Keep your performance and timing but deliver it in a cloned voice, or fix a single line by re-voicing it.

🌍 Localization & Dubbing Give dubbed dialogue the original speaker's voice so a character sounds like themselves across every take.

🎮 Game & Character Audio Turn one recording into several distinct character voices from a set of reference samples.

WHAT WORKS BEST / WHAT TO AVOID

✅ Works great

Clean source audio with a single speaker
A clear, dry target voice sample with little background noise
Steady narration, dialogue, and voiceover
Speech that is well separated from music

⚠️ May produce softer results

Noisy or reverb-heavy source recordings
Target samples with music or crosstalk behind the voice
Overlapping speakers in one clip
Very long files run in a single pass

FAQ

What is ChatterBox? ChatterBox is an open source text to speech and voice cloning model from Resemble AI, released under the MIT license. It clones a voice from a few seconds of reference audio and is strong at natural, expressive delivery. This workflow runs it in voice changer mode, so it converts one recording into a different voice.

How does a voice changer work? You give it two things: the audio you want to convert and a short sample of the target voice. The model keeps the words, timing, and performance from your source recording and rebuilds them in the target voice, so the delivery stays the same while the speaker identity changes.

Does ChatterBox need training or a long voice sample? No. It clones a voice zero shot from a few seconds of reference audio, with no training step. A short, clean sample of the target voice is enough to get a usable result.

What is the difference between a voice changer and text to speech? A voice changer starts from existing audio and swaps the voice while keeping the original performance. Text to speech starts from written text and generates speech from scratch. Use this workflow when you already have a recording and want it in a different voice.

Can I use ChatterBox voice conversion for commercial projects? Yes. ChatterBox ships under the MIT license, which allows commercial use. Two things to keep in mind: outputs carry Resemble AI's PerTh watermark that marks them as AI generated, and you should have the rights or consent to clone any voice you use as a target.

Does ChatterBox support languages other than English? Yes. The ChatterBox family includes a multilingual model covering 23 languages. This workflow is set to English by default, so switch the language setting to match your source audio when you convert a clip in another language.

How to run ChatterBox online? You can run ChatterBox online through Floyo. No installation, no setup, no API key to wire up. Open the workflow in your browser, upload your source clip and a target voice, and hit run. Free to try.

WHY FLOYO?

Floyo is the only platform with team collaboration for ComfyUI in the browser. You run workflows with no install. You share run history, assets, and models across your team. You pay only when you generate. Floyo supports open-source and closed-source models.

A sound editor runs a conversion and likes the result. A teammate opens that exact run from shared history and keeps going. No file handoffs. No version confusion.

For studios and enterprise teams, Floyo adds private workspaces, pooled resources, and a team usage dashboard. Other ComfyUI cloud tools run for one person at a time. Floyo runs for the whole team, with transparent per-generation costs.

Ready to try it? Upload a clip, add a target voice, and run it. The settings are already dialled in.

→ Launch Workflow, Free

Questions? Watch the free course or check the FAQ above.

Discover more workflows

You might like these too.

Gemini 3.1 Flash TTS for Text to Speech:

floyoofficial

audio

gemini

gemini 3.1 flash tts

google

multi-speaker

text to speech

tts

voiceover

Turn any script into natural spoken audio with Gemini 3.1 Flash TTS, Google's text-to-speech model. Type your text, pick a voice, describe the tone, and hit run.

Gemini 3.1 Flash TTS for Text to Speech:

Turn any script into natural spoken audio with Gemini 3.1 Flash TTS, Google's text-to-speech model. Type your text, pick a voice, describe the tone, and hit run.

floyoofficial

433

Chatterbox

TTS

Text to speech workflow using Chatterbox

Chatterbox Text to Speech

Text to speech workflow using Chatterbox

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

agi

434

Motion Control

SCAIL2

SCAIL2を使用したモーションコントロール用のワークフローです。動画をモーション元としてアップロードし、入れ替えたい人物画像をアップロードすることで、動画内の動きをキャラクターへ転送できます。緑色のノードだけ設定すればすぐ使えます。 This is an Image-to-Video motion control workflow using SCAIL2.

SCAIL2_Motion_Control

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images