API

Pricing

Workflows

API

Pricing

ComfyUI-FunAudioLLM

Author SpenserCai

https://github.com/SpenserCai/ComfyUI-FunAudioLLM

Last updated

2024-11-27

Run hundreds of ComfyUI nodes and workflows in your browser.

ComfyUI-FunAudioLLM is a custom node designed for integration with the FunAudioLLM framework, featuring advanced voice synthesis capabilities through its CosyVoice and SenseVoice models. This tool enhances ComfyUI's functionality by enabling users to generate high-quality audio outputs with various voice models and customizable options.

Supports multiple voice synthesis methods, including zero-shot and cross-lingual capabilities.
Allows users to save and load speaker models, improving efficiency in voice generation tasks.
Provides punctuation segmentation for more natural-sounding audio, enhancing the user experience.

Context

This repository introduces a custom node for ComfyUI that leverages the FunAudioLLM framework, specifically incorporating the CosyVoice and SenseVoice models. Its primary purpose is to facilitate advanced audio generation, enabling users to produce diverse voice outputs tailored to specific needs.

Key Features & Benefits

The tool offers unique features such as support for various synthesis techniques, including zero-shot and cross-lingual generation, which are crucial for creating versatile audio outputs. The ability to save and load speaker models streamlines the workflow, allowing users to reuse configurations without starting from scratch each time.

Advanced Functionalities

CosyVoice supports specialized features like SFT (Supervised Fine-Tuning) and a 25Hz model for improved audio fidelity. Additionally, SenseVoice includes punctuation segmentation, which, when enabled, enhances the fluidity and naturalness of the generated speech, making it more aligned with human-like intonation.

Practical Benefits

By integrating these functionalities, ComfyUI-FunAudioLLM significantly enhances users' control over audio generation processes, leading to higher quality outputs and more efficient workflows. This tool allows artists and developers to produce tailored audio experiences with minimal effort, ultimately improving productivity in AI-driven projects.

Credits/Acknowledgments

The tool is developed by SpenserCai, with contributions from the FunAudioLLM community. The repository is open-source, allowing for community collaboration and further enhancements.

Discover most popular workflows

Hand-picked based on what hundreds of other artists looked at.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

Nano Banana 2: Fast Image Generation & Editing

floyoofficial

4.6k

API

gemini flash image

Image2Image

Text2Image

typography

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

Nano Banana 2: Fast Image Generation & Editing

The top-ranked image model on Artificial Analysis and LM Arena. 4K output, text rendering, and subject consistency across 5 characters.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

goshnii

10.7k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

4.7k

API

Image to Video

LTX2.3

LTX 2.3

LTX 2.3 Pro Image to Video

LTX 2.3

Author

SpenserCai