API

Pricing

Workflows

API

Pricing

ComfyUI_SparkTTS

Author billwuhao

https://github.com/billwuhao/ComfyUI_SparkTTS

Last updated

2025-05-23

Run hundreds of ComfyUI nodes and workflows in your browser.

Using Spark-TTS within ComfyUI allows users to generate high-quality text-to-speech outputs utilizing a language model that supports voice cloning across various languages. This tool is designed to enhance the audio capabilities of ComfyUI by providing efficient and customizable speech synthesis.

Enables cross-lingual voice cloning, allowing for diverse and natural-sounding speech generation.
Features a recording node for real-time audio capture, enhancing user interactivity during speech synthesis.
Offers tunable parameters for customization, giving users control over the generated audio's characteristics.

Context

This tool integrates Spark-TTS, a sophisticated text-to-speech model, into the ComfyUI environment. Its primary function is to facilitate the conversion of text into spoken language with high fidelity, supporting multiple languages and voice styles.

Key Features & Benefits

The Spark-TTS ComfyUI node provides several practical features that enhance text-to-speech generation. The ability to clone voices across languages enables users to create audio outputs that sound authentic and varied, which is particularly valuable for applications requiring multilingual support. Additionally, the inclusion of a recording node allows users to capture audio live, providing a seamless way to create and edit speech outputs.

Advanced Functionalities

Among its advanced capabilities, Spark-TTS supports customizable parameters that allow users to fine-tune aspects of the generated speech, such as pitch, speed, and tone. This flexibility enables users to tailor the audio output to specific needs and preferences, enhancing the overall user experience.

Practical Benefits

The integration of Spark-TTS significantly streamlines workflows within ComfyUI by providing high-quality speech synthesis that is both efficient and user-friendly. Users gain greater control over audio outputs, leading to improved quality and faster production times, which can be crucial in various applications, from content creation to interactive media.

Credits/Acknowledgments

The development of this tool is based on the Spark-TTS model, with contributions from the original authors and the open-source community. The repository is available under a suitable license, encouraging further collaboration and enhancement.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

billwuhao