Voice Changer using TTS Audio Suite (ChatterBox)
Convert any voice to match a target speaker using ChatterBox TTS. Upload source and narrator audio, run it, get back a converted MP3. No voice training needed.
audio
Audio2Audio
Chatterbox
tts
TTS Audio Suite
voice conversion
0
64
Nodes & Models
WorkflowGraphics
LoadAudio
ChatterBoxEngineNode
UnifiedVoiceChangerNode
SaveAudioMP3
Description:
Turn one voice into another using ChatterBox and the TTS Audio Suite voice converter.
Upload two audio files: the voice you want to change, and a reference clip of the voice you want it to become. The workflow runs the source through ChatterBox's voice conversion pipeline and outputs a new MP3 where the words stay the same but the voice matches your target. No text input needed.
How do you change a voice with ChatterBox in ComfyUI?
Upload a source audio (the voice to convert) and a narrator target audio (the voice to copy). ChatterBox handles the conversion using its TTS engine. Adjust pitch, steps, and engine settings to fine-tune the result. Defaults work well for most conversions.
Source Audio This is the audio you want to transform. The words and timing come from this file. Drop in any spoken audio clip.
Narrator Target This is the voice you want to copy. ChatterBox uses this reference to reshape the source audio. A clean, clear clip of a few seconds works best. The more distinct the target voice, the stronger the conversion.
Pitch Shift (default: 1) Want to keep the original pitch range? Leave it at 1. Converting between voices with different pitch ranges (male to female, for example)? Adjust up or down. A value above 1 raises pitch, below 1 lowers it.
Number of Steps (default: 30) More steps means a more refined conversion but takes longer. 30 is a good balance. Want faster previews? Drop to 15-20. Want the cleanest possible output? Try 40-50, but the difference gets smaller past 30.
Pitch Estimation Algorithm (default: smart) Controls how the workflow detects pitch in your source audio. "Smart" picks the best method automatically. Leave this alone unless you're troubleshooting a specific pitch tracking issue.
ChatterBox Engine Settings
Language (default: English) Set this to match the language of your source audio.
Exaggeration (default: 0.5) Controls how much the engine pushes vocal characteristics. Higher values make the converted voice more expressive but can introduce artifacts. Lower values stay closer to a neutral delivery. Start at 0.5 and adjust based on what you hear.
CFG Weight (default: 0.8) Higher values follow the target voice more closely. Lower values give the engine more creative room. 0.8 is a good starting point for most voice changes.
Temperature (default: 0.5) Adds variation to the output. Low temperature keeps things predictable. Higher temperature introduces more randomness, which can sound more natural or more chaotic depending on the content. Try a few values and listen.
Output Format The workflow saves your converted audio as a high-quality 320kbps MP3. A conversion info display shows details about the process.
What is ChatterBox voice changing good for?
ChatterBox voice conversion works well for dubbing, voiceover prototyping, character voice creation, and any situation where you need spoken audio in a different voice without re-recording. It preserves the original speech content while transforming the vocal identity.
If you're creating content that needs multiple character voices from a single narrator recording, this saves hours of studio time. Record once, convert to as many voices as you have reference clips for.
Podcasters and video creators can use it to prototype how a script sounds in different voices before committing to a final narrator. Upload a rough read-through and a few target voice clips to compare.
The conversion quality depends on the clarity of both inputs. Noisy source audio or a mumbled target reference will give weaker results. Clean recordings in both slots make a noticeable difference.
For high-fidelity voice cloning where every nuance needs to be perfect, dedicated voice cloning services may still have an edge. This workflow is best for creative projects, prototyping, and situations where speed matters more than studio-grade precision.
FAQ
What audio formats work with the ChatterBox voice changer? The workflow accepts standard audio formats including MP3 and WAV. Your source and target clips should be clean spoken audio. Background music or heavy noise will reduce conversion quality. Keep reference clips short and clear for the best results.
How long should the target voice reference be for ChatterBox? A few seconds of clear speech is enough for ChatterBox to capture the vocal characteristics. Longer clips can help with more complex voices, but you don't need minutes of audio. Five to ten seconds of clean, representative speech works for most cases.
Can I change the pitch when converting between male and female voices? Yes. The pitch shift setting handles this. Set it above 1 to raise pitch (useful when converting male to female) or below 1 to lower it. Start with small adjustments and listen to the output before going further.
Does ChatterBox voice conversion work with non-English audio? ChatterBox supports multiple languages. Set the language parameter in the engine settings to match your source audio. Quality may vary by language, so test with a short clip first.
How do I run ChatterBox voice changer online? You can run ChatterBox voice changer online through Floyo. No installation, no setup. Open the workflow in your browser, upload your inputs, and hit run. Free to try.
Read more
