API

Pricing

Workflows

API

Pricing

VibeVoice-ComfyUI

Author Enemyx-net

https://github.com/Enemyx-net/VibeVoice-ComfyUI

1509

Last updated

2026-02-18

Run hundreds of ComfyUI nodes and workflows in your browser.

A robust integration for Microsoft's VibeVoice text-to-speech model within ComfyUI, this tool facilitates high-quality voice synthesis for both single and multi-speaker scenarios, enhancing user workflows with advanced features. It provides seamless voice cloning and customization options, making it a valuable asset for those looking to generate realistic speech from text.

Supports single and multi-speaker voice synthesis, allowing for conversations with distinct voices.
Features voice cloning capabilities, enabling users to replicate specific voice characteristics from audio samples.
Offers customizable parameters such as voice speed control and text chunking for improved audio output quality.

Context

This tool serves as a comprehensive integration of the VibeVoice text-to-speech model into the ComfyUI framework, facilitating high-quality voice synthesis directly in user workflows. Its primary purpose is to enhance the capabilities of ComfyUI by allowing users to generate realistic speech from text inputs, whether for single speakers or multi-speaker dialogues.

Key Features & Benefits

The integration provides several practical features that significantly enhance user experience:

Single and Multi-Speaker TTS: Users can generate speech for one or multiple speakers, with support for up to four distinct voices, making it ideal for creating dialogues or conversations.
Voice Cloning: This feature allows users to clone voices from audio samples, providing the ability to create personalized or character-specific speech outputs.
Text File Loading and Automatic Chunking: Users can load scripts directly from text files, and long texts are automatically divided into manageable chunks, ensuring smooth processing and output.

Advanced Functionalities

The tool includes advanced capabilities such as:

LoRA Support: Users can fine-tune voices using custom LoRA adapters, allowing for specialized voice characteristics while maintaining the base model's functionality.
Voice Speed Control: This feature lets users adjust the rate of speech by modifying the reference voice speed, which is particularly useful for creating natural-sounding dialogues.
Custom Pause Tags: Users can insert pauses of specified durations within the text, enhancing control over speech pacing and timing.

Practical Benefits

By integrating these features, the tool improves workflow efficiency and control within ComfyUI. Users can generate high-quality audio outputs with greater flexibility and customization, allowing for better management of voice characteristics and speech pacing. This results in a more streamlined process for producing realistic and engaging audio content.

Credits/Acknowledgments

The integration was developed by Fabio Sarracino, with contributions from the community. The VibeVoice model itself is maintained by Microsoft Research and is subject to their licensing terms. The tool is released under the MIT License, promoting open-source collaboration and usage.

Inner Nodes

LoadTextFromFileNode

VibeVoice Free Memory

VibeVoice LoRA

VibeVoice Load Text From File

VibeVoice Multiple Speakers

VibeVoice Single Speaker

VibeVoiceFreeMemoryNode

VibeVoiceLoRANode

VibeVoiceMultipleSpeakersNode

VibeVoiceSingleSpeakerNode

Run workflows with this node

Community workflows using VibeVoice-ComfyUI.

VibeVoice: Single-Speaker Text to Speech

floyoofficial

995

text to speech

TTS

VibeVoice

voice cloning

VibeVoice

VibeVoice: Single-Speaker Text to Speech

VibeVoice

floyoofficial

482

Multi Speaker

TTS

VibeVoice

Speech Multi Speaker

VibeVoice Text to Speech Multi Speaker

Speech Multi Speaker

Author

Enemyx-net