API

Pricing

Workflows

API

Pricing

TTS-Audio-Suite

Author diodiogod

https://github.com/diodiogod/TTS-Audio-Suite

1084

Last updated

2026-06-24

Run hundreds of ComfyUI nodes and workflows in your browser.

A powerful extension for ComfyUI, the TTS Audio Suite integrates multiple Text-to-Speech (TTS) engines and voice conversion capabilities, enabling users to generate high-quality audio from text across various languages. This suite not only supports extensive audio editing and SRT subtitle synchronization but also offers advanced features like real-time voice conversion and model training.

Supports a wide range of TTS engines including ChatterBox, F5-TTS, and RVC, allowing for versatile audio generation.
Features advanced subtitle processing, enabling the creation and editing of SRT files directly from TTS outputs.
Integrates voice conversion capabilities, allowing users to modify and refine audio outputs in real-time.

Context

The TTS Audio Suite is a custom node integration for ComfyUI designed to facilitate local multi-engine and multi-language Text-to-Speech (TTS) and voice conversion. Its primary purpose is to provide a comprehensive solution for users looking to generate audio from text while maintaining control over various aspects of the audio generation process.

Key Features & Benefits

The tool boasts several practical features that enhance its usability:

Multi-Engine Support: Users can choose from a variety of TTS engines, each with unique capabilities, allowing for tailored audio outputs.
SRT Timing and Subtitle Management: The suite includes tools for generating and editing SRT files, ensuring that audio and subtitles are perfectly synchronized.
Voice Conversion: The integration of real-time voice conversion allows for dynamic alterations to audio, enhancing creative possibilities.

Advanced Functionalities

The TTS Audio Suite offers advanced functionalities such as:

Iterative Voice Conversion: Users can refine audio outputs through multiple conversion passes, improving the fidelity of the voice to match a desired target.
Integrated Model Training: The suite supports the training of voice models directly within the application, enabling users to create custom voice profiles based on their specific needs.
Emotion Control: Advanced emotion vector controls allow users to manipulate the emotional tone of the generated speech, enhancing expressiveness.

Practical Benefits

This tool significantly improves workflow efficiency in ComfyUI by:

Streamlining the process of generating high-quality audio from text, reducing the time and effort required for audio production.
Providing a unified interface for managing multiple TTS engines and voice conversion tasks, simplifying user interactions.
Enhancing output quality and flexibility through advanced editing and voice conversion features, allowing for more creative audio projects.

Credits/Acknowledgments

The TTS Audio Suite is developed by Diogod and draws upon contributions from various authors and the ComfyUI community. The project is licensed under the MIT License, ensuring open access and collaborative development.

Inner Nodes

ASRPunctuationTruecaseNode

AudioAnalyzerNode

AudioAnalyzerOptionsNode

CharacterVoicesNode

ChatterBoxAudioAnalyzer

ChatterBoxAudioAnalyzerOptions

ChatterBoxEngineNode

ChatterBoxF5TTSEditOptions

ChatterBoxF5TTSEditVoice

ChatterBoxOfficial23LangEngineNode

ChatterBoxVoiceCapture

CosyVoice Engine

CosyVoiceEngineNode

DotsTTSEngineNode

EchoTTSEngineNode

F5TTSEngineNode

GraniteASREngineNode

HiggsAudioEngineNode

HiggsAudioV3EngineNode

IndexTTS Engine

IndexTTSEmotionOptionsNode

IndexTTSEngineNode

LoadRVCModelNode

MergeAudioNode

MossClipStagingNode

MossDatasetPrepNode

MossDatasetRowsNode

MossTTSEngineNode

MossTrainingConfigNode

MouthMovementAnalyzer

OmniVoiceEngineNode

OmniVoiceInstructionBuilderNode

PhonemeTextNormalizer

Qwen3TTSEngineNode

Qwen3TTSVoiceDesignerNode

QwenEmotionNode

RVCDatasetPrepNode

RVCEngineNode

RVCPitchOptionsNode

RVCTrainingConfigNode

RefreshVoiceCacheNode

SRTAdvancedOptionsNode

Step Audio EditX Engine

StepAudioEditXAudioEditorNode

StepAudioEditXEngineNode

StringMultilineTagEditor

TextToSRTBuilderNode

UnifiedASRTranscribeNode

UnifiedModelTrainingNode

UnifiedSoundEffectsNode

UnifiedTTSSRTNode

UnifiedTTSTextNode

UnifiedVoiceChangerNode

VibeVoiceEngineNode

VisemeDetectionOptionsNode

VocalRemovalNode

VoiceFixerNode

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

diodiogod