API

Pricing

Workflows

API

Pricing

ComfyUI_parakeet-tdt

Author billwuhao

https://github.com/billwuhao/ComfyUI_parakeet-tdt

Last updated

2025-06-15

Run hundreds of ComfyUI nodes and workflows in your browser.

An automatic speech recognition (ASR) model tailored for high-quality English transcription, parakeet-tdt-0.6b-v2 offers rapid processing with features like punctuation, capitalization, and precise timestamp predictions. This tool integrates seamlessly with ComfyUI, enhancing the transcription workflow for users.

Supports accurate transcription with real-time punctuation and capitalization.
Features timestamp prediction for better synchronization of text with audio.
Designed for efficient integration into ComfyUI, streamlining the captioning process.

Context

The parakeet-tdt-0.6b-v2 is a specialized node for ComfyUI that provides advanced automatic speech recognition capabilities. Its primary purpose is to facilitate high-quality transcription of English speech, making it a valuable asset for users looking to convert spoken content into written form quickly and accurately.

Key Features & Benefits

This tool's standout feature is its ability to produce transcriptions that include punctuation and capitalization, which enhances readability. Additionally, the accurate timestamp prediction allows users to align text with audio precisely, making it easier to create captions or subtitles for videos or other media.

Advanced Functionalities

Parakeet-tdt-0.6b-v2 leverages advanced machine learning techniques to ensure high accuracy in transcription. Its integration with ComfyUI allows for quick adjustments and real-time feedback, providing users with a responsive and efficient transcription experience.

Practical Benefits

By incorporating this ASR model into their workflows, users can significantly enhance their efficiency and control over the transcription process. The automation of punctuation and capitalization reduces manual editing time, while accurate timestamps improve the overall quality of the final output, making it suitable for professional use.

Credits/Acknowledgments

This tool is based on the NeMo framework developed by NVIDIA, which provides the underlying architecture for the ASR model. The original authors and contributors have made this integration possible, allowing users to leverage cutting-edge technology for their transcription needs.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

billwuhao