floyo logo
Powered by
ThinkDiffusion
⚡️Nano Banana 2 ⚡️ just landed. Start creating now.
floyo logo
Powered by
ThinkDiffusion
⚡️Nano Banana 2 ⚡️ just landed. Start creating now.

Chatterbox Text to Speech

Text to speech workflow using Chatterbox

150

Generates in about -- secs

Nodes & Models

WorkflowGraphics
LoadAudio
SaveAudio
PreviewAudio
ChatterboxTTS
ChatterboxVC
ChatterboxTTS
ChatterboxTTS
ChatterboxVC

ChatterBox TTS is an open‑source text‑to‑speech and voice‑cloning system that turns text into natural‑sounding speech, lets you clone voices from a few seconds of audio, and gives fine control over emotion and intensity.​

What it does

  • Converts text into high‑quality speech with controls for pitch, speed, and emotion (from neutral to highly dramatic).​

  • Performs zero‑shot voice cloning: upload a short reference clip (around 5 seconds) and it can mimic that voice without separate training.​

  • Supports multilingual output (around 22 languages) and can keep a cloned voice consistent across languages for dubbing/localization.​

Voice change and control

  • Works as a voice changer by cloning a target voice and then speaking any input text in that style, allowing accent, pacing, and emotional intensity adjustments.​

  • Provides explicit “exaggeration” or intensity parameters so you can dial emotion and expressiveness up or down programmatically.​

  • Includes watermarking/provenance options (PerTh) in some deployments so synthetic audio can be detected and tracked responsibly.​​

How it’s typically used

  • Via web UIs where you paste text, choose or clone a voice, adjust emotion/pacing, and download audio.​

  • As a self‑hosted or API‑based engine for agents, NPCs, audiobooks, podcasts, accessibility tools, or localized dubbing.

Read more

N