VibeVoice 1.5B
TTS, 1.5B, Microsoft
audio
microsoft
TTS
Vibevoice
0
230
Nodes & Models
LoadTextFromFileNode
VibeVoiceSingleSpeakerNode
Note
LoadAudio
PreviewAudio
VibeVoice ComfyUI Nodes
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
โจ Features
Core Functionality
๐ค Single Speaker TTS: Generate natural speech with optional voice cloning
๐ฅ Multi-Speaker Conversations: Support for up to 4 distinct speakers
๐ฏ Voice Cloning: Clone voices from audio samples
๐จ LoRA Support: Fine-tune voices with custom LoRA adapters (v1.4.0+)
๐๏ธ Voice Speed Control: Adjust speech rate by modifying reference voice speed (v1.5.0+)
๐ Text File Loading: Load scripts from text files
๐ Automatic Text Chunking: Handles long texts seamlessly with configurable chunk size
โธ๏ธ Custom Pause Tags: Insert silences with
[pause]and[pause:ms]tags (wrapper feature)๐ Node Chaining: Connect multiple VibeVoice nodes for complex workflows
โน๏ธ Interruption Support: Cancel operations before or between generations
๐ง Flexible Configuration: Control temperature, sampling, and guidance scale
Performance & Optimization
โก Attention Mechanisms: Choose between auto, eager, sdpa, flash_attention_2 or sage
๐๏ธ Diffusion Steps: Adjustable quality vs speed trade-off (default: 20)
๐พ Memory Management: Toggle automatic VRAM cleanup after generation
๐งน Free Memory Node: Manual memory control for complex workflows
๐ Apple Silicon Support: Native GPU acceleration on M1/M2/M3 Macs via MPS
๐ข 8-Bit Quantization: Perfect audio quality with high VRAM reduction
๐ข 4-Bit Quantization: Maximum VRAM savings with minimal quality loss
Read more
0
Reply


