floyo logobeta logo
Powered by
ThinkDiffusion
Lock in a year of flow. Get 50% off your first year. Limited time offer. Claim now ⏰
floyo logobeta logo
Powered by
ThinkDiffusion
Lock in a year of flow. Get 50% off your first year. Limited time offer. Claim now ⏰

Chatterbox Text to Speech

Text to speech workflow using Chatterbox

14

ChatterBox TTS is an open‑source text‑to‑speech and voice‑cloning system that turns text into natural‑sounding speech, lets you clone voices from a few seconds of audio, and gives fine control over emotion and intensity.​

What it does

  • Converts text into high‑quality speech with controls for pitch, speed, and emotion (from neutral to highly dramatic).​

  • Performs zero‑shot voice cloning: upload a short reference clip (around 5 seconds) and it can mimic that voice without separate training.​

  • Supports multilingual output (around 22 languages) and can keep a cloned voice consistent across languages for dubbing/localization.​

Voice change and control

  • Works as a voice changer by cloning a target voice and then speaking any input text in that style, allowing accent, pacing, and emotional intensity adjustments.​

  • Provides explicit “exaggeration” or intensity parameters so you can dial emotion and expressiveness up or down programmatically.​

  • Includes watermarking/provenance options (PerTh) in some deployments so synthetic audio can be detected and tracked responsibly.​​

How it’s typically used

  • Via web UIs where you paste text, choose or clone a voice, adjust emotion/pacing, and download audio.​

  • As a self‑hosted or API‑based engine for agents, NPCs, audiobooks, podcasts, accessibility tools, or localized dubbing.

Read more

Generates in about -- secs

Nodes & Models

ChatterBox TTS is an open‑source text‑to‑speech and voice‑cloning system that turns text into natural‑sounding speech, lets you clone voices from a few seconds of audio, and gives fine control over emotion and intensity.​

What it does

  • Converts text into high‑quality speech with controls for pitch, speed, and emotion (from neutral to highly dramatic).​

  • Performs zero‑shot voice cloning: upload a short reference clip (around 5 seconds) and it can mimic that voice without separate training.​

  • Supports multilingual output (around 22 languages) and can keep a cloned voice consistent across languages for dubbing/localization.​

Voice change and control

  • Works as a voice changer by cloning a target voice and then speaking any input text in that style, allowing accent, pacing, and emotional intensity adjustments.​

  • Provides explicit “exaggeration” or intensity parameters so you can dial emotion and expressiveness up or down programmatically.​

  • Includes watermarking/provenance options (PerTh) in some deployments so synthetic audio can be detected and tracked responsibly.​​

How it’s typically used

  • Via web UIs where you paste text, choose or clone a voice, adjust emotion/pacing, and download audio.​

  • As a self‑hosted or API‑based engine for agents, NPCs, audiobooks, podcasts, accessibility tools, or localized dubbing.

Read more