API

Pricing

Workflows

API

Pricing

ComfyUI-Qwen-VL-API

Author ZHO-ZHO-ZHO

https://github.com/ZHO-ZHO-ZHO/ComfyUI-Qwen-VL-API

210

Last updated

2024-05-22

Run hundreds of ComfyUI nodes and workflows in your browser.

QWen-VL is an integration of the QWen-VL visual language models (Plus and Max) into ComfyUI, enabling advanced visual processing capabilities through API calls. This tool enhances the user experience by allowing for high-resolution image analysis and context-aware dialogue interactions.

Supports two advanced models: QWen-VL-Plus and QWen-VL-Max, each offering unique enhancements in visual and textual recognition.
Facilitates both single and multi-turn conversations, allowing for more complex interactions with the AI.
Capable of processing local images, with an automatic cleanup feature for temporary storage.

Context

QWen-VL serves as an extension within ComfyUI that incorporates two powerful models from Alibaba's QWen-VL project. Its primary purpose is to provide users with state-of-the-art visual language processing capabilities, making it one of the leading open-source models available.

Key Features & Benefits

The QWen-VL extension includes two models: QWen-VL-Plus, which excels in detail recognition and supports high-resolution images, and QWen-VL-Max, which further enhances visual reasoning and instruction-following capabilities. This dual-model approach allows users to select the best-suited model for their specific tasks, improving the accuracy and effectiveness of visual processing.

Advanced Functionalities

QWen-VL offers implicit API key management, enabling seamless integration and operation within ComfyUI. The tool supports context-aware dialogues, allowing users to engage in multi-turn conversations, which is particularly beneficial for applications requiring more nuanced interactions. Furthermore, it can accept local images for processing, ensuring flexibility in input types.

Practical Benefits

By incorporating QWen-VL into their workflows, users can significantly enhance their control over visual tasks, improve the quality of outputs, and increase overall efficiency. The ability to handle both single and multi-turn dialogues allows for richer user interactions, and the high-resolution capabilities ensure that even complex visual inputs are processed with precision.

Credits/Acknowledgments

This tool is based on the QWen-VL models developed by Alibaba, and the integration work has been carried out by the contributors of the ComfyUI-Qwen-VL-API repository. The extension is available under open-source licensing, allowing for community contributions and enhancements.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

ZHO-ZHO-ZHO