API

Pricing

Workflows

API

Pricing

ComfyUI_CSM

Author billwuhao

https://github.com/billwuhao/ComfyUI_CSM

Last updated

2025-06-02

Run hundreds of ComfyUI nodes and workflows in your browser.

ComfyUI's Conversational Speech Model (CSM) node enables the cloning of voices for multi-person dialogues, currently supporting up to two participants in a conversation. This tool enhances the ComfyUI framework by allowing users to create realistic voice interactions that can be saved and reused.

Facilitates the cloning of voices for engaging, lifelike conversations.
Supports the management of saved speaker profiles for easy access and reuse.
Allows for structured dialogue input to ensure proper voice interaction flow.

Context

The Conversational Speech Model (CSM) node is an extension for ComfyUI that focuses on generating conversational audio by simulating the voices of multiple speakers. Its primary purpose is to enhance user experience by enabling realistic dialogue interactions, making it a valuable tool for developers and creators working with voice synthesis.

Key Features & Benefits

This tool offers a straightforward way to clone voices and manage multi-person conversations. Users can save specific speaker profiles, allowing for quick retrieval and consistent voice representation in future dialogues, which streamlines the creative process.

Advanced Functionalities

The CSM node supports a structured dialogue format, where conversations are input in a specific syntax to ensure clarity and flow. This structured approach helps maintain the context of the conversation and enhances the realism of the generated audio.

Practical Benefits

By integrating the CSM node into ComfyUI, users can significantly improve their workflow by reducing the time spent on voice generation and management. The ability to clone voices and save speaker profiles increases control over audio output, leading to higher quality and more efficient production of conversational content.

Credits/Acknowledgments

The development of this tool is credited to the SesameAILabs team, whose contributions have been pivotal in advancing voice synthesis technology. The repository is open-source and encourages collaboration within the community.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

billwuhao