VibeVoice 1.5B

TTS, 1.5B, Microsoft

audio

microsoft

TTS

Vibevoice

303

Generates in about 14 secs

calmconqueror

Nodes & Models

VibeVoice-ComfyUI

LoadTextFromFileNode

VibeVoiceSingleSpeakerNode

ComfyUI Official

Note

LoadAudio

PreviewAudio

VibeVoice ComfyUI Nodes

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

✨ Features

Core Functionality

🎤 Single Speaker TTS: Generate natural speech with optional voice cloning
👥 Multi-Speaker Conversations: Support for up to 4 distinct speakers
🎯 Voice Cloning: Clone voices from audio samples
🎨 LoRA Support: Fine-tune voices with custom LoRA adapters (v1.4.0+)
🎚️ Voice Speed Control: Adjust speech rate by modifying reference voice speed (v1.5.0+)
📝 Text File Loading: Load scripts from text files
📚 Automatic Text Chunking: Handles long texts seamlessly with configurable chunk size
⏸️ Custom Pause Tags: Insert silences with [pause] and [pause:ms] tags (wrapper feature)
🔄 Node Chaining: Connect multiple VibeVoice nodes for complex workflows
⏹️ Interruption Support: Cancel operations before or between generations
🔧 Flexible Configuration: Control temperature, sampling, and guidance scale

Performance & Optimization

⚡ Attention Mechanisms: Choose between auto, eager, sdpa, flash_attention_2 or sage
🎛️ Diffusion Steps: Adjustable quality vs speed trade-off (default: 20)
💾 Memory Management: Toggle automatic VRAM cleanup after generation
🧹 Free Memory Node: Manual memory control for complex workflows
🍎 Apple Silicon Support: Native GPU acceleration on M1/M2/M3 Macs via MPS
🔢 8-Bit Quantization: Perfect audio quality with high VRAM reduction
🔢 4-Bit Quantization: Maximum VRAM savings with minimal quality loss

calmconqueror

• 5 months ago

Credit: https://github.com/Enemyx-net/VibeVoice-ComfyUI