API

Pricing

Workflows

API

Pricing

ComfyUI_StepAudioTTS

Author billwuhao

https://github.com/billwuhao/ComfyUI_StepAudioTTS

130

Last updated

2025-05-23

Run hundreds of ComfyUI nodes and workflows in your browser.

A Text To Speech node for ComfyUI that utilizes Step-Audio-TTS, enabling users to generate spoken content, rap, sing, or even clone voices. This tool enhances the audio capabilities within ComfyUI, making it versatile for various audio generation tasks.

Supports multiple languages, including English, Chinese, Korean, and more.
Allows custom speaker definitions for tailored voice outputs directly within the ComfyUI environment.
Features adjustable parameters for recording and audio processing, enhancing flexibility and control over the output quality.

Context

This tool serves as a Text To Speech (TTS) node integrated into ComfyUI, leveraging the Step-Audio-TTS framework. Its primary function is to convert text input into spoken audio, offering a range of vocal styles from regular speech to singing and voice cloning, thus broadening the creative possibilities for users.

Key Features & Benefits

The TTS node offers practical features such as the ability to define custom speakers, allowing users to create unique voice profiles tailored to their needs. Additionally, it includes adjustable parameters for recording audio, which enhances the quality and responsiveness of the generated speech. The support for multiple languages ensures that users can cater to a diverse audience.

Advanced Functionalities

This tool includes advanced options such as noise reduction sensitivity and time-frequency smoothing, which help improve the clarity and naturalness of the generated audio. Users can also utilize a new recording node called MW Audio Recorder, which provides real-time recording capabilities with visual progress tracking.

Practical Benefits

By integrating this TTS node into ComfyUI, users can significantly enhance their workflow, gaining better control over audio outputs and improving overall quality. The ability to customize voice profiles and adjust various audio parameters streamlines the process of creating high-quality audio content, making it more efficient and user-friendly.

Credits/Acknowledgments

The development of this tool draws from contributions by various open-source projects, including Step-Audio, CosyVoice, transformers, and FunASR. Acknowledgment is given to these projects for their foundational work, which has enabled the creation of this TTS node.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

billwuhao