API

Pricing

Workflows

API

Pricing

ComfyUI-GeminiImageToPrompt

Author santiagosamuel3455

https://github.com/santiagosamuel3455/ComfyUI-GeminiImageToPrompt

Last updated

2025-05-04

Run hundreds of ComfyUI nodes and workflows in your browser.

ComfyUI-GeminiImageToPrompt is an advanced system designed to generate prompts for audiovisual content creation, integrating three specialized nodes to enhance multimodal workflows. It leverages Google's Gemini model and KlingAI technology to streamline the process of transforming text and images into rich, cinematic prompts.

Utilizes the Gemini Text to Cinematic Prompt Node to convert simple text into detailed cinematic prompts, enhancing narrative and stylistic elements.
Features the Gemini Image to Prompt Node, which analyzes images to produce descriptive prompts tailored for video production, minimizing manual input.
Incorporates the Deepseek R1 Node with KlingAI for generating prompts without credit consumption, making it budget-friendly while supporting hybrid text and image workflows.

Context

This tool serves as an extension within ComfyUI, focusing on the generation of prompts that are essential for creating high-quality audiovisual content. By integrating multimodal capabilities, it allows users to efficiently transition from textual and visual inputs to detailed prompts suitable for video and image production.

Key Features & Benefits

The system's standout features include the Gemini Text to Cinematic Prompt Node, which adds depth to simple descriptions by incorporating technical details like lighting and camera angles, thus enhancing the overall cinematic quality. The Gemini Image to Prompt Node automates the prompt creation process from images, extracting essential visual components and translating them into actionable instructions for video conversion. The inclusion of the Deepseek R1 Node with KlingAI enables users to generate prompts without incurring costs, making it an accessible option for budget-conscious creators.

Advanced Functionalities

One of the advanced capabilities of this system is its ability to perform multimodal analysis, allowing users to input both text and images to create comprehensive prompts. The Gemini nodes not only enrich the narrative quality but also adapt the prompts for specific media formats, ensuring that the output aligns with professional standards.

Practical Benefits

This tool significantly enhances workflow efficiency by reducing the manual effort required in prompt creation. By automating the generation of high-quality prompts from both text and images, users can maintain a consistent visual and narrative style across their projects. This leads to improved quality in audiovisual outputs while saving time and operational costs.

Credits/Acknowledgments

The repository is developed by contributors who have integrated Google's Gemini model and KlingAI technology, with no specific license mentioned.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

santiagosamuel3455