API

Pricing

Workflows

API

Pricing

ComfyUI-Qwen3-ASR

Author kaushiknishchay

https://github.com/kaushiknishchay/ComfyUI-Qwen3-ASR

Last updated

2026-03-05

Run hundreds of ComfyUI nodes and workflows in your browser.

A robust extension for ComfyUI, the Qwen3-ASR integration provides advanced speech-to-text transcription and language identification capabilities. It is designed to deliver high-accuracy results across 52 languages and dialects, including various English accents and multiple Chinese dialects.

Supports the Qwen3-ASR models (0.6B and 1.7B) for superior transcription accuracy.
Offers word-level timestamps and automatic language detection, enhancing the usability of transcriptions.
Incorporates VRAM-optimized features like FlashAttention 2 for efficient processing and reduced memory usage.

Context

This tool serves as a high-performance integration for the Qwen3-ASR model family within ComfyUI, focusing on delivering state-of-the-art automatic speech recognition (ASR) and language identification. Its primary purpose is to facilitate accurate transcription of audio inputs while providing detailed timing information for each word.

Key Features & Benefits

The extension boasts several practical features that enhance its functionality:

High Accuracy: By utilizing the Qwen3-ASR models, users can achieve precise transcription results across a wide range of languages and dialects.
Word-Level Timestamps: This feature, when used in conjunction with the Qwen3-ForcedAligner, allows users to receive detailed timing information for each word, which is critical for applications that require precise synchronization with audio.
Automatic Language Detection: The ability to automatically identify the language being spoken simplifies the user experience, allowing for seamless transcriptions without the need for manual input.

Advanced Functionalities

The integration supports flexible precision settings (bf16, fp16, fp32), enabling users to optimize performance based on their hardware capabilities. Additionally, the FlashAttention 2 feature significantly reduces VRAM usage while accelerating inference, making it suitable for users with limited resources.

Practical Benefits

By incorporating this tool into their workflows, users can expect improved efficiency and control over their audio transcription processes. The ability to handle long audio segments with automatic resampling ensures that the tool can adapt to various audio qualities and lengths, enhancing overall productivity in ComfyUI.

Credits/Acknowledgments

This project is developed under the MIT License, with contributions from the original authors and the community. The Qwen3 models are governed by the Qwen License Agreement, ensuring compliance and proper usage.

Inner Nodes

Qwen3ASRTranscriber

Qwen3ForcedAlignerConfig

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

kaushiknishchay