floyo logo
Powered by
ThinkDiffusion
Pricing
Wan 2.7 is now live. Check it out 👉🏼
floyo logo
Powered by
ThinkDiffusion
Pricing
Wan 2.7 is now live. Check it out 👉🏼
Last updated
2026-04-04

The ComfyUI-QwenVL custom node integrates advanced vision-language models from the Qwen-VL series, including Qwen2.5-VL and Qwen3-VL, into the ComfyUI framework. This tool supports multimodal AI functionalities such as text generation, image comprehension, and video analysis, enhancing the capabilities of AI workflows.

  • Supports multiple models, including the latest Qwen3-VL and Qwen2.5-VL, with options for GGUF backends.
  • Features both standard and advanced nodes for varying levels of control and complexity in AI tasks.
  • Implements smart caching and prompt persistence for improved efficiency and resource management.

Context

The ComfyUI-QwenVL custom node is designed to enrich the ComfyUI environment by integrating sophisticated vision-language models developed by Alibaba Cloud. Its primary purpose is to facilitate multimodal AI applications, enabling users to seamlessly generate text, analyze images, and process videos within their workflows.

Key Features & Benefits

This tool offers a range of practical features that enhance user experience and productivity:

  • Standard & Advanced Nodes: Users can choose between a straightforward QwenVL node for quick tasks or an advanced version that provides detailed control over parameters like temperature and sampling methods.
  • Prompt Enhancers: Dedicated nodes for enhancing text prompts, compatible with both Qwen3 and GGUF models, allow for more tailored AI interactions.
  • Smart Prompt Caching: This feature prevents unnecessary prompt regeneration, significantly speeding up workflow execution and maintaining performance across sessions.

Advanced Functionalities

The custom node includes specialized capabilities such as:

  • Bypass Mode: This allows users to keep previously generated prompts active while changing inputs, optimizing resource usage and speeding up the workflow.
  • Fixed Seed Mode: Users can maintain consistent outputs by setting a fixed seed, ensuring that variations in input media do not affect the generated results.
  • WAN 2.2 Integration: This supports advanced video generation workflows, including image-to-video (I2V) and text-to-video (T2V) capabilities, complete with professional cinematography specifications.

Practical Benefits

The integration of the Qwen-VL series into ComfyUI enhances overall workflow efficiency, control, and output quality. By allowing users to manage model loading and prompt persistence smartly, it reduces computational overhead and increases the speed of processing tasks. The ability to handle both images and videos within a unified framework empowers creators to produce high-quality content swiftly.

Credits/Acknowledgments

The development of this repository is credited to the Qwen Team at Alibaba Cloud for their innovative vision-language models and to the ComfyUI community for providing a robust platform. Special thanks to contributors who helped optimize memory management and enhance the tool's functionality. The code is released under the GPL-3.0 License.

Inner Nodes

VRAMCleanup