Sopro for Text to Speech
Turn your text to excellent speech using SoproTTS
Audio2Audio
SoproTTS
Text to Speech
TTS
1
52
SoproTTS is a lightweight text‑to‑speech system with zero‑shot voice cloning that can turn text into speech in almost real time, even on CPU‑only machines.
What it is
An open‑source TTS model (~135–169M parameters, depending on version) that generates English speech from text.
Designed to run efficiently on CPUs with real‑time or faster‑than‑real‑time performance, and exposed in ComfyUI via Sopro TTS custom nodes.
Key features
Zero‑shot voice cloning: give it a few seconds of reference audio and it mimics that speaker’s voice for new text.
CPU‑friendly speed: around 0.05–0.25 real‑time factor (up to ~20× real time on an M3 CPU, or ~4× on typical CPUs).
Streaming and non‑streaming modes, so you can get low‑latency first audio or batch‑generate longer clips.
ComfyUI integration: Sopro TTS nodes accept text plus optional reference audio, and output waveform audio for the rest of your graph.
Adjustable speech speed and “temperature” for pacing and variation control.
Best‑fit use cases
Local, low‑resource voiceovers for videos or tutorials when you only have CPU and want open‑source TTS.
Voice cloning for characters or narrators in ComfyUI workflows, using short reference samples.
Interactive tools and prototypes where you need quick speech feedback without cloud TTS or big GPU models.
Read more
