VibeVoice 1.5B
TTS, 1.5B, Microsoft
audio
microsoft
TTS
Vibevoice
0
138
Nodes & Models
LoadTextFromFileNode
VibeVoiceSingleSpeakerNode
Note
LoadAudio
PreviewAudio
VibeVoice ComfyUI Nodes
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
✨ Features
Core Functionality
🎤 Single Speaker TTS: Generate natural speech with optional voice cloning
👥 Multi-Speaker Conversations: Support for up to 4 distinct speakers
🎯 Voice Cloning: Clone voices from audio samples
🎨 LoRA Support: Fine-tune voices with custom LoRA adapters (v1.4.0+)
🎚️ Voice Speed Control: Adjust speech rate by modifying reference voice speed (v1.5.0+)
📝 Text File Loading: Load scripts from text files
📚 Automatic Text Chunking: Handles long texts seamlessly with configurable chunk size
⏸️ Custom Pause Tags: Insert silences with
[pause]and[pause:ms]tags (wrapper feature)🔄 Node Chaining: Connect multiple VibeVoice nodes for complex workflows
⏹️ Interruption Support: Cancel operations before or between generations
🔧 Flexible Configuration: Control temperature, sampling, and guidance scale
Performance & Optimization
⚡ Attention Mechanisms: Choose between auto, eager, sdpa, flash_attention_2 or sage
🎛️ Diffusion Steps: Adjustable quality vs speed trade-off (default: 20)
💾 Memory Management: Toggle automatic VRAM cleanup after generation
🧹 Free Memory Node: Manual memory control for complex workflows
🍎 Apple Silicon Support: Native GPU acceleration on M1/M2/M3 Macs via MPS
🔢 8-Bit Quantization: Perfect audio quality with high VRAM reduction
🔢 4-Bit Quantization: Maximum VRAM savings with minimal quality loss
Read more
0
Reply


