API

Pricing

Workflows

API

Pricing

Comfyui-Spark-TTS

Author 1038lab

https://github.com/1038lab/ComfyUI-SparkTTS

105

Last updated

2025-04-15

Run hundreds of ComfyUI nodes and workflows in your browser.

ComfyUI-SparkTTS is a specialized node for ComfyUI that integrates SparkTTS, an advanced text-to-speech (TTS) system utilizing large language models to produce natural and precise speech synthesis. This tool allows users to create and clone voices with customizable attributes, enhancing the audio generation capabilities within the ComfyUI environment.

Supports voice creation and cloning with adjustable parameters like gender, pitch, and speed.
Enables audio recording directly within the interface for immediate processing and voice cloning.
Offers internationalization features, making it accessible for users across different languages.

Context

ComfyUI-SparkTTS is designed to enhance the ComfyUI framework by adding robust text-to-speech functionalities. It leverages the capabilities of SparkTTS to provide users with tools for generating high-quality speech from text, facilitating a variety of applications in multimedia and accessibility.

Key Features & Benefits

The tool provides several practical features, including:

Voice Creation: Users can personalize voice characteristics, allowing for diverse applications in content creation and user interaction.
Voice Cloning: This feature enables the replication of existing voices from audio samples, which is particularly useful for projects requiring specific vocal identities.
Audio Recording: Directly recording audio simplifies the process of creating voice samples, making the workflow more efficient.

Advanced Functionalities

ComfyUI-SparkTTS includes advanced voice cloning capabilities, allowing users to manipulate pitch and speed beyond basic cloning. This level of control can significantly enhance the realism and expressiveness of synthesized speech, catering to specific project needs.

Practical Benefits

By integrating this tool into ComfyUI, users can streamline their workflow for audio generation, improve the quality of synthesized speech, and gain more control over vocal attributes. This leads to enhanced productivity and the ability to create more engaging and personalized audio content.

Credits/Acknowledgments

The project is maintained under the GPL-3.0 License, and its development is attributed to the contributors of the SparkTTS project, which can be found on GitHub.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

1038lab