floyo logobeta logo
Powered by
ThinkDiffusion
floyo logobeta logo
Powered by
ThinkDiffusion

ComfyUI_StepAudioTTS

130

Last updated
2025-05-23

A Text To Speech node for ComfyUI that utilizes Step-Audio-TTS, enabling users to generate spoken content, rap, sing, or even clone voices. This tool enhances the audio capabilities within ComfyUI, making it versatile for various audio generation tasks.

  • Supports multiple languages, including English, Chinese, Korean, and more.
  • Allows custom speaker definitions for tailored voice outputs directly within the ComfyUI environment.
  • Features adjustable parameters for recording and audio processing, enhancing flexibility and control over the output quality.

Context

This tool serves as a Text To Speech (TTS) node integrated into ComfyUI, leveraging the Step-Audio-TTS framework. Its primary function is to convert text input into spoken audio, offering a range of vocal styles from regular speech to singing and voice cloning, thus broadening the creative possibilities for users.

Key Features & Benefits

The TTS node offers practical features such as the ability to define custom speakers, allowing users to create unique voice profiles tailored to their needs. Additionally, it includes adjustable parameters for recording audio, which enhances the quality and responsiveness of the generated speech. The support for multiple languages ensures that users can cater to a diverse audience.

Advanced Functionalities

This tool includes advanced options such as noise reduction sensitivity and time-frequency smoothing, which help improve the clarity and naturalness of the generated audio. Users can also utilize a new recording node called MW Audio Recorder, which provides real-time recording capabilities with visual progress tracking.

Practical Benefits

By integrating this TTS node into ComfyUI, users can significantly enhance their workflow, gaining better control over audio outputs and improving overall quality. The ability to customize voice profiles and adjust various audio parameters streamlines the process of creating high-quality audio content, making it more efficient and user-friendly.

Credits/Acknowledgments

The development of this tool draws from contributions by various open-source projects, including Step-Audio, CosyVoice, transformers, and FunASR. Acknowledgment is given to these projects for their foundational work, which has enabled the creation of this TTS node.