floyo logo
Powered by
ThinkDiffusion
Pricing
Wan 2.7 is now live. Check it out 👉🏼
floyo logo
Powered by
ThinkDiffusion
Pricing
Wan 2.7 is now live. Check it out 👉🏼
Last updated
2026-04-04

A versatile extension for ComfyUI, the TTS Audio Suite integrates multiple Text-to-Speech engines and voice conversion tools into a unified interface, enhancing audio generation capabilities. It supports a wide array of engines and features, allowing for extensive customization and control over TTS workflows.

  • Supports various TTS engines including ChatterBox, F5-TTS, and RVC, enabling voice cloning and real-time conversion.
  • Provides advanced subtitle processing features like automatic SRT generation and timing estimation, enhancing workflow efficiency.
  • Includes a modular architecture that allows for easy integration of new engines and functionalities in the future.

Context

The TTS Audio Suite is a custom node integration designed for ComfyUI that facilitates multi-engine and multi-language Text-to-Speech (TTS) and voice conversion functionalities. Its primary purpose is to streamline audio generation processes by offering a cohesive set of tools for generating speech, editing audio, and managing subtitles within the ComfyUI framework.

Key Features & Benefits

This tool boasts a rich set of features that significantly enhance TTS capabilities:

  • Multi-Engine Support: Users can choose from numerous TTS engines like ChatterBox, F5-TTS, and RVC, each offering unique strengths in voice quality and language support.
  • SRT Processing: The suite includes a sophisticated SRT builder that can generate subtitles from text, estimate timings, and maintain project control tags, which is crucial for media projects requiring precise synchronization.
  • Voice Conversion: With real-time voice conversion capabilities, users can modify voices to match specific styles or characters, improving the versatility of audio outputs.

Advanced Functionalities

The TTS Audio Suite features advanced functionalities such as:

  • Iterative Voice Conversion: Users can refine voice conversion results through multiple passes, allowing for progressive enhancement of audio quality.
  • Integrated Model Training: The suite supports RVC model training directly within the interface, facilitating the creation of custom voice models tailored to specific needs.
  • Emotion Control: Advanced emotion control mechanisms enable users to adjust the emotional tone of the generated speech, adding depth and realism to audio outputs.

Practical Benefits

Utilizing the TTS Audio Suite within ComfyUI enhances overall workflow efficiency by providing:

  • Seamless Integration: The unified interface allows for easy management of various TTS tasks without switching between different tools or platforms.
  • Increased Control: Users can customize generation parameters and audio processing techniques, leading to higher quality and more tailored audio outputs.
  • Time-Saving Features: The automatic SRT generation and intelligent text chunking reduce the time spent on manual adjustments, allowing users to focus on creative aspects.

Credits/Acknowledgments

The TTS Audio Suite was developed by Diogod, building upon the original ChatterBox Voice project by ShmuelRonen. The project is open-source and licensed under the MIT License, promoting collaborative development and contributions from the community.

Inner Nodes

ASRPunctuationTruecaseNode
AudioAnalyzerNode
AudioAnalyzerOptionsNode
CharacterVoicesNode
ChatterBoxAudioAnalyzer
ChatterBoxAudioAnalyzerOptions
ChatterBoxEngineNode
ChatterBoxF5TTSEditOptions
ChatterBoxF5TTSEditVoice
ChatterBoxOfficial23LangEngineNode
ChatterBoxVoiceCapture
CosyVoice Engine
CosyVoiceEngineNode
EchoTTSEngineNode
F5TTSEngineNode
GraniteASREngineNode
HiggsAudioEngineNode
IndexTTS Engine
IndexTTSEmotionOptionsNode
IndexTTSEngineNode
LoadRVCModelNode
MergeAudioNode
MouthMovementAnalyzer
PhonemeTextNormalizer
Qwen3TTSEngineNode
Qwen3TTSVoiceDesignerNode
QwenEmotionNode
RVCDatasetPrepNode
RVCEngineNode
RVCPitchOptionsNode
RVCTrainingConfigNode
RefreshVoiceCacheNode
SRTAdvancedOptionsNode
Step Audio EditX Engine
StepAudioEditXAudioEditorNode
StepAudioEditXEngineNode
StringMultilineTagEditor
TextToSRTBuilderNode
UnifiedASRTranscribeNode
UnifiedModelTrainingNode
UnifiedTTSSRTNode
UnifiedTTSTextNode
UnifiedVoiceChangerNode
VibeVoiceEngineNode
VisemeDetectionOptionsNode
VocalRemovalNode
VoiceFixerNode