API

Pricing

Workflows

API

Pricing

ComfyUI_MiniCPM-V-4_5

Author IuvenisSapiens

https://github.com/IuvenisSapiens/ComfyUI_MiniCPM-V-4_5

238

Last updated

2025-11-21

Run hundreds of ComfyUI nodes and workflows in your browser.

The ComfyUI_MiniCPM-V-4_5 is a specialized implementation that enhances the ComfyUI platform by enabling various types of queries—text, video, single-image, and multi-image—to generate descriptive captions or responses. This tool significantly broadens the capabilities of ComfyUI, making it a versatile option for users looking to extract insights from diverse media formats.

Supports a wide range of query types, allowing for flexible input methods.
Introduces parameters like keep_model_loaded and seed for improved performance and reproducibility.
Facilitates detailed analysis and caption generation for both images and videos, enhancing content understanding.

Context

The ComfyUI_MiniCPM-V-4_5 tool serves as an integration of the MiniCPM-V-4_5 model within the ComfyUI framework, focusing on processing and interpreting various media types through user queries. Its primary purpose is to provide users with the ability to generate informative captions and responses based on text, video, and images, thereby enriching the interaction with visual and textual data.

Key Features & Benefits

This tool offers practical features that cater to different media types. Users can input text queries to receive information or descriptions, analyze videos frame by frame for detailed captions, and generate insights from single or multiple images. The introduction of the keep_model_loaded parameter allows for efficient memory usage by keeping the model in GPU memory for repeated predictions, while the seed parameter ensures that results can be reproduced consistently.

Advanced Functionalities

One of the standout capabilities of this tool is its ability to process video queries, providing detailed captions or summaries based on the content of uploaded videos. Additionally, the multi-image query functionality allows users to create narratives or cohesive descriptions from a series of images, making it a powerful asset for storytelling and content creation.

Practical Benefits

The integration of ComfyUI_MiniCPM-V-4_5 streamlines workflows by enabling users to interact with multiple media formats through a single interface. By generating accurate captions and responses, it enhances the quality of outputs and provides users with greater control over their projects. This efficiency is particularly beneficial in scenarios where time and clarity are crucial.

Credits/Acknowledgments

This implementation is based on the original work by the MiniCPM-V team and is maintained by contributors from the ComfyUI community. The project is open-source, allowing for collaborative improvements and updates.

Inner Nodes

DisplayText

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

IuvenisSapiens