API

Pricing

Workflows

API

Pricing

ComfyUI_Qwen3-VL-Instruct

Author IuvenisSapiens

https://github.com/IuvenisSapiens/ComfyUI_Qwen3-VL-Instruct

551

Last updated

2026-05-29

Run hundreds of ComfyUI nodes and workflows in your browser.

The ComfyUI Qwen3-VL-Instruct integration enhances the ComfyUI platform by enabling users to perform a variety of queries, including text, video, single-image, and multi-image queries, to generate captions or responses. This tool aims to streamline the process of obtaining contextual information or descriptions from various media formats.

Supports diverse query types, allowing for flexible interaction with the system.
Generates detailed captions or summaries from both images and videos, enhancing content understanding.
Integrates seamlessly with ComfyUI, requiring minimal setup for optimal functionality.

Context

The ComfyUI Qwen3-VL-Instruct is an extension designed to facilitate advanced query processing within the ComfyUI framework. Its primary purpose is to allow users to input different types of media—text, video, and images—to receive informative captions or responses, thereby enriching the user experience and expanding the utility of the ComfyUI platform.

Key Features & Benefits

This tool offers practical functionalities that significantly enhance user interaction. Users can input text queries to receive descriptive responses, upload videos for frame-by-frame analysis, and submit images for individual or collective descriptions. This versatility caters to a wide range of use cases, making it easier for users to extract meaningful information from various media formats.

Advanced Functionalities

The Qwen3-VL-Instruct extension includes sophisticated capabilities, such as generating narratives from multiple images, which can be particularly useful for storytelling or thematic presentations. This feature allows users to create a cohesive context or storyline from disparate images, enhancing the overall narrative quality and engagement.

Practical Benefits

By integrating Qwen3-VL-Instruct into ComfyUI, users benefit from improved workflow efficiency and enhanced control over content generation. The ability to process diverse media types within a single interface reduces the need for multiple tools, streamlining tasks and improving the overall quality of outputs.

Credits/Acknowledgments

This tool is an implementation of the Qwen3-VL-Instruct model developed by the original authors at QwenLM, with contributions from the ComfyUI community. It is available under an open-source license, allowing for collaborative development and continuous improvement.

Inner Nodes

ImageLoader

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

IuvenisSapiens