API

Pricing

Workflows

API

Pricing

LongCat AudioDiT for Voice Clone

Clone any voice from a short audio sample with LongCat AudioDiT 3.5B. Upload a reference clip, type what you want it to say, and get speech in that voice.

audio generation

film production

longcat

text to speech

voice cloning

voiceover

115

Generates in about 43 secs

floyoofficial

Nodes & Models

ComfyUI Official

LoadAudio

NormalizeAudioLoudness

LongCatVoiceCloneTTS

SaveAudioMP3

Voice cloning text-to-speech with LongCat AudioDiT 3.5B.

Upload a short reference clip of someone speaking. Paste a transcript of what they say in that clip. Write the new text you want spoken. The model generates audio of your new text in the reference voice.

No training. No fine-tuning. Works with any clean voice sample.

How do you clone a voice with LongCat AudioDiT?

Upload a clean reference audio clip of the voice you want to clone. Paste the exact transcript of what's said in that clip into the prompt text field. Write your new line in the text field. Hit run. The model generates speech of your new text in the reference voice.

Reference audio A short clean recording of someone speaking. Want the cleanest output? Use clear speech with no music, no background noise, and consistent volume. Around 5 to 15 seconds is the sweet spot.

Reference transcript Write down exactly what's said in your reference audio, word for word. The model uses this to map sound to text. Mismatched transcripts hurt quality more than people expect.

New text What you want the cloned voice to say. Keep punctuation natural since the model uses it for pacing. Want a pause? Use a comma or period.

Steps Default is 25. Want faster results? Drop to 15 to 20. Want cleaner audio with fewer artifacts? Push to 30 to 40. Past 40, gains flatten out.

Guidance strength Default is 4. Want output that sticks closer to the reference voice? Push to 5 or 6. Voice sounding stiff or over-styled? Drop to 2 or 3.

Guidance method Default is "apg". Stable across most voices. Switch methods if your output sounds flat or develops weird artifacts.

Seed Randomized by default. Got an output you like? Lock the seed to compare other settings against the same baseline.

What is LongCat AudioDiT good for?

Voice cloning workflows where you need TTS in a specific voice without training a model. Good for voiceover drafts, audiobook narration, character voices for animation or games, dubbing in someone's voice, and long-form content where vocal consistency across hundreds of lines matters more than emotional range.

Best on clean source material. Reference audio quality sets the ceiling. Studio-quality input gives you studio-quality output. Phone recordings give you phone recordings back.

Use it to test voiceover scripts before booking talent. Use it for character voices that need to sound identical across a hundred lines. Use it for accessibility work where you're generating spoken content at scale.

Doing one-off TTS where the voice doesn't matter? A standard TTS workflow is faster. Need real emotional range and acting? Human voice talent still wins.

FAQ

How long should my reference audio be for LongCat AudioDiT? Around 5 to 15 seconds works well. Too short and the model has nothing to learn from. Too long and quality plateaus while runtime grows. One or two clean sentences spoken naturally beats a minute of varied content.

Why does my LongCat AudioDiT output sound robotic or distorted? Three usual causes. Your reference audio has noise or background music. Your transcript doesn't match what's said in the audio. Or your guidance strength is too high. Clean the input, fix the transcript, drop guidance to 3 or 4.

Can LongCat AudioDiT clone any voice? It handles most voices that speak clearly. Heavy accents, whispered speech, singing, or voices with strong vocal effects can confuse it. Stick to natural conversational speech for the best results.

Does the transcript need to be perfect? Yes. Word-for-word accuracy matters. The model aligns sound to text, so a sloppy transcript means sloppy cloning. Spend the extra minute getting it right.

How to run LongCat AudioDiT online? You can run LongCat AudioDiT online through Floyo. No installation, no setup. Open the workflow in your browser, upload your inputs, and hit run. Free to try.

Discover more workflows

You might like these too.

floyoofficial

391

audiodit

dialogue

longcat

multi-speaker

text to speech

voice cloning

Clone two voices from short audio samples and generate dialogue between them with LongCat AudioDiT 3.5B. Upload your references, write your script, hit run.

LongCat AudioDiT for Multi Speaker TTS

Clone two voices from short audio samples and generate dialogue between them with LongCat AudioDiT 3.5B. Upload your references, write your script, hit run.

floyoofficial

160

audiodit

audio generation

longcat

text to speech

tts

Turn text into spoken audio with LongCat AudioDiT 3.5B, Meituan's open-source diffusion TTS model. Clean voice quality in English and Chinese, no setup.

LongCat AudioDiT for TTS

Turn text into spoken audio with LongCat AudioDiT 3.5B, Meituan's open-source diffusion TTS model. Clean voice quality in English and Chinese, no setup.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)