API

Pricing

Workflows

API

Pricing

ComfyUI_Fill-ChatterBox

Author filliptm

https://github.com/filliptm/ComfyUI_Fill-ChatterBox

136

Last updated

2025-06-25

Run hundreds of ComfyUI nodes and workflows in your browser.

A custom extension for ComfyUI, this tool integrates text-to-speech (TTS) and voice conversion (VC) functionalities powered by the Chatterbox library. It is designed to handle audio synthesis for a maximum duration of 40 seconds, ensuring quality while maintaining performance.

Supports multiple nodes for TTS, VC, and dialog synthesis, allowing for versatile audio generation.
Offers customizable parameters such as emotion intensity and randomness, enhancing control over audio output.
Facilitates the creation of multi-speaker dialogues, producing isolated audio tracks for each speaker to streamline editing.

Context

This extension enhances ComfyUI by providing advanced audio synthesis capabilities, specifically TTS and voice conversion. Its purpose is to enable users to generate high-quality audio from text inputs and perform voice cloning, thereby expanding the creative possibilities within the ComfyUI environment.

Key Features & Benefits

The tool features various nodes, each tailored for specific audio tasks. The TTS node allows users to convert text into speech with adjustable parameters, while the VC node enables the conversion of existing audio into different voices. The Dialog TTS node stands out by supporting conversations with up to four distinct speakers, making it ideal for creating dynamic dialogue scenes.

Advanced Functionalities

Advanced capabilities include the ability to control emotion intensity and randomness in speech synthesis, which can significantly affect the expressiveness of the generated audio. Additionally, the Dialog TTS node can isolate audio tracks for each speaker, allowing for detailed audio editing and production workflows.

Practical Benefits

This tool streamlines the audio generation process within ComfyUI, enhancing workflow efficiency and providing users with greater control over audio quality. By enabling the creation of multi-speaker dialogues and offering customizable parameters, it allows for more nuanced and engaging audio outputs.

Credits/Acknowledgments

The extension is developed by filliptm, with contributions from the open-source community. It is licensed under the appropriate terms that support collaborative development and usage.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

filliptm