API

Pricing

Workflows

API

Pricing

ComfyUI-KokoroTTS

Author benjiyaya

https://github.com/benjiyaya/ComfyUI-KokoroTTS

Last updated

2025-03-18

Run hundreds of ComfyUI nodes and workflows in your browser.

A custom node for ComfyUI that enables text-to-speech functionalities using the Kokoro TTS engine. This integration allows users to convert written text into spoken audio, enhancing multimedia applications and workflows.

High-quality speech synthesis with a variety of voice options.
Supports multiple languages, making it versatile for global applications.
Seamless integration into existing ComfyUI workflows for streamlined usage.

Context

This tool, known as the Kokoro TextToSpeech Node, serves to implement text-to-speech capabilities within the ComfyUI framework. Its primary function is to transform textual input into realistic audio output, leveraging the Kokoro TTS engine for high-quality voice synthesis.

Key Features & Benefits

The Kokoro TextToSpeech Node offers several practical features, including high-fidelity audio output and a selection of multiple voice profiles. Users can choose from various voices, including American and British accents, which enhances the adaptability of the tool for different applications and audiences. Additionally, the node supports multilingual text input, broadening its usability across diverse languages.

Advanced Functionalities

This node includes advanced features like LatentSync, which enables lip-syncing capabilities for animations or video projects. Users can connect the audio output to visual elements, enhancing the interactivity and realism of their projects. The node also has comprehensive error handling, providing detailed feedback for troubleshooting common issues, such as missing files or invalid inputs.

Practical Benefits

By integrating the Kokoro TextToSpeech Node into their workflows, users can significantly improve their efficiency and control over audio production in ComfyUI. The ability to generate high-quality speech from text allows for more engaging multimedia content, while the straightforward integration process ensures a smooth user experience. This enhances overall productivity in creative projects, making it easier to incorporate voice elements.

Credits/Acknowledgments

The Kokoro TTS engine is credited to its original creators, and the project is licensed under MIT and Apache 2.0 licenses. Additional acknowledgments go to the ComfyUI community and contributors, including those behind the ComfyUI-BS_Kokoro-onnx repository.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

benjiyaya