floyo logobeta logo
Powered by
ThinkDiffusion
floyo logobeta logo
Powered by
ThinkDiffusion

ComfyUI Qwen2.5-VL Object Detection Node

102

Last updated
2025-06-24

This repository features a custom node for ComfyUI that enables object detection using the Qwen 2.5 VL model. It facilitates the downloading of the model, executes detection prompts, and outputs bounding boxes, which can then be utilized with segmentation nodes.

  • Allows users to specify model parameters such as precision and device for optimized performance.
  • Outputs bounding boxes in a structured JSON format, supporting various selection and merging options.
  • Prepares bounding boxes in a compatible format for integration with other segmentation tools like SAM2.

Context

This tool serves as a specialized extension within ComfyUI, designed to streamline the process of object detection using the Qwen 2.5 VL model. Its primary aim is to provide users with a straightforward way to detect objects in images and manage the resulting bounding boxes for further processing.

Key Features & Benefits

The custom nodes included in this repository offer practical functionalities that enhance the user experience in ComfyUI. The DownloadAndLoadQwenModel node allows for flexible model management, including device selection and precision settings, which are crucial for optimizing performance based on available hardware. The QwenVLDetection node provides a robust method for conducting object detection, returning bounding boxes that are sorted by confidence and allowing for tailored outputs based on user-defined parameters.

Advanced Functionalities

The tool includes advanced features such as the ability to merge bounding boxes and filter them based on confidence scores. Users can specify which boxes to return through the bbox_selection parameter, enabling a more controlled output that is essential for applications requiring precision in object identification.

Practical Benefits

By integrating this tool into their workflows, users can significantly enhance their control over image processing tasks. The ability to manage model parameters, receive structured output for bounding boxes, and prepare these outputs for further segmentation tasks improves overall efficiency and quality in ComfyUI projects.

Credits/Acknowledgments

The repository is maintained by contributors from the ComfyUI community and utilizes the Qwen 2.5 VL model developed by the QwenLM team. The tool is open-source, allowing for collaborative improvements and adaptations.