floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

Sopro for Text to Speech

Turn your text to excellent speech using SoproTTS

52

Generates in about -- secs

Nodes & Models

WorkflowGraphics
LoadAudio
SaveAudioMP3

SoproTTS is a lightweight text‑to‑speech system with zero‑shot voice cloning that can turn text into speech in almost real time, even on CPU‑only machines.

What it is

  • An open‑source TTS model (~135–169M parameters, depending on version) that generates English speech from text.

  • Designed to run efficiently on CPUs with real‑time or faster‑than‑real‑time performance, and exposed in ComfyUI via Sopro TTS custom nodes.

Key features

  • Zero‑shot voice cloning: give it a few seconds of reference audio and it mimics that speaker’s voice for new text.

  • CPU‑friendly speed: around 0.05–0.25 real‑time factor (up to ~20× real time on an M3 CPU, or ~4× on typical CPUs).

  • Streaming and non‑streaming modes, so you can get low‑latency first audio or batch‑generate longer clips.

  • ComfyUI integration: Sopro TTS nodes accept text plus optional reference audio, and output waveform audio for the rest of your graph.

  • Adjustable speech speed and “temperature” for pacing and variation control.​

Best‑fit use cases

  • Local, low‑resource voiceovers for videos or tutorials when you only have CPU and want open‑source TTS.

  • Voice cloning for characters or narrators in ComfyUI workflows, using short reference samples.

  • Interactive tools and prototypes where you need quick speech feedback without cloud TTS or big GPU models.

Read more

N