API

Pricing

Workflows

API

Pricing

ComfyUI Custom Dia

Author nobrainX2

https://github.com/nobrainX2/comfyUI-customDia

Last updated

2025-05-29

Run hundreds of ComfyUI nodes and workflows in your browser.

This tool is an integration of the Dia TTS model into ComfyUI, enabling users to convert text into speech with enhanced features such as multi-channel audio support and voice cloning. It allows for the generation of dynamic audio outputs, making it a versatile addition for users looking to create rich audio experiences.

Supports multi-channel audio inputs for stereo output, enhancing audio quality.
Allows for voice cloning by using audio tensors, enabling personalized speech synthesis.
Includes a speech prompt feature that supports speaker switching and nonverbal audio tags for more expressive output.

Context

This tool serves as a custom integration of the Dia TTS (Text-to-Speech) model within the ComfyUI framework. Its primary purpose is to facilitate the conversion of typed dialogue into audio, while also offering advanced functionalities like voice cloning and multi-channel audio processing.

Key Features & Benefits

One of the standout features is the support for multi-channel audio inputs, which allows users to work with stereo files or audio tensors directly from ComfyUI nodes. Additionally, the tool includes a speech prompt capability that lets users define dialogue with speaker tags and nonverbal cues, enriching the audio output. The ability to clone voices using audio tensors further enhances the personalization of the generated speech.

Advanced Functionalities

The tool's voice cloning feature is particularly noteworthy, as it allows users to input an audio tensor for more realistic speech synthesis. To optimize this feature, users are encouraged to provide a transcript of the input audio, which significantly improves the quality of the cloned voice. This capability is especially useful for creating unique character voices in various applications.

Practical Benefits

By integrating this tool into their workflow, users can significantly enhance their audio production capabilities within ComfyUI. The multi-channel support and voice cloning features provide greater control over audio output, resulting in higher quality and more engaging audio experiences. This tool streamlines the process of generating speech, allowing for efficient and effective audio creation.

Credits/Acknowledgments

The development of this tool is credited to the original authors at nari-labs, whose work on the Dia TTS model has made this integration possible. The tool is licensed under the same terms as the original repository, ensuring that users can benefit from its functionalities while respecting the original creators' contributions.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

nobrainX2