API

Pricing

Workflows

API

Pricing

ComfyUI-Gemini_TTS

Author ShmuelRonen

https://github.com/ShmuelRonen/ComfyUI-Gemini_TTS

Last updated

2025-05-23

Run hundreds of ComfyUI nodes and workflows in your browser.

A custom node for ComfyUI, Gemini TTS integrates Google’s Gemini Text-to-Speech technology into user workflows, enabling the generation of high-quality speech with a diverse selection of over 30 voices. It supports both free and paid tiers, making it versatile for various usage scenarios.

30+ distinct voices available, including both male and female options with unique characteristics.
Supports a dual-tier system, allowing users to choose between a free tier for testing and a paid tier suitable for production needs.
Features smart fallback capabilities that automatically switch models when usage limits are reached, ensuring uninterrupted service.

Context

Gemini TTS is a specialized node designed for ComfyUI that leverages Google's advanced text-to-speech capabilities. Its primary purpose is to facilitate the generation of natural-sounding speech, making it useful for applications ranging from voiceovers to interactive media.

Key Features & Benefits

This tool offers a rich selection of over 30 voices, allowing users to select from a variety of tones and styles to suit different contexts. The dual-tier system provides flexibility, enabling users to start with a free tier and scale up to a paid tier as their needs grow. Additionally, the smart fallback feature ensures that users can maintain functionality even when they reach their usage limits, enhancing the overall user experience.

Advanced Functionalities

Gemini TTS includes advanced options like voice characteristics descriptions and customizable node parameters. Users can control the speech generation process through parameters such as temperature, which affects the creativity of the output, and can opt for automatic model switching to manage rate limitations effectively.

Practical Benefits

By incorporating Gemini TTS into their ComfyUI workflows, users gain improved control over speech generation, allowing for higher quality outputs and greater efficiency in their projects. The ability to choose from a wide range of voices and the seamless integration of features like error handling and billing management streamline the workflow, making it easier to produce professional-grade audio content.

Credits/Acknowledgments

This project is developed by contributors to the ComfyUI community and utilizes Google's Gemini API, subject to Google's terms of service and pricing policies.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

ShmuelRonen