API

Pricing

Workflows

API

Pricing

ComfyUI_MaskGCT

Author 807502278

https://github.com/807502278/ComfyUI_MaskGCT

Last updated

2025-04-18

Run hundreds of ComfyUI nodes and workflows in your browser.

Amphion-MaskGCT is a ComfyUI node that combines zero-sample voice synthesis with advanced speech-to-text capabilities using the OpenAI Whisper model. This tool enables users to generate realistic speech from text and transcribe audio into text efficiently.

Supports zero-shot voice synthesis, allowing for speech generation without prior audio samples.
Integrates OpenAI's Whisper model for high-quality speech recognition and transcription.
Offers a variety of audio editing features, including resampling, trimming, and language detection.

Context

Amphion-MaskGCT is designed as an extension for ComfyUI, enhancing its capabilities with advanced audio processing features. The primary goal of this tool is to facilitate both text-to-speech synthesis and speech-to-text transcription, making it a valuable addition for users looking to work with audio data in a seamless manner.

Key Features & Benefits

This tool provides several practical functionalities, including the ability to generate speech from text without needing sample audio, which is particularly useful for applications requiring diverse voice outputs. The integration of the Whisper model ensures accurate transcription of spoken language into text, accommodating various languages and dialects. Additionally, its audio editing capabilities allow users to manipulate audio data effectively, enhancing overall flexibility in audio processing tasks.

Advanced Functionalities

Amphion-MaskGCT includes sophisticated features such as multilingual slicing, which divides text into manageable segments based on punctuation, and automatic language recognition to optimize speech generation. The tool also supports customizable audio generation parameters, allowing users to fine-tune aspects like speech length and pause durations, which can be crucial for creating natural-sounding outputs.

Practical Benefits

By incorporating Amphion-MaskGCT into their workflows, users can significantly improve their efficiency and control over audio processing tasks in ComfyUI. The ability to generate and transcribe audio with high fidelity streamlines projects that involve voiceovers, automated transcription, and multilingual content creation, ultimately enhancing the quality of the outputs.

Credits/Acknowledgments

The development of Amphion-MaskGCT is credited to a collaborative effort by multiple authors and contributors, with the core paper detailing the technology available on arXiv. The project is open-source, allowing users to build upon and contribute to its ongoing development.

Inner Nodes

audio_resample

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

807502278