LongCat AudioDiT for Voice Clone
Clone any voice from a short audio sample with LongCat AudioDiT 3.5B. Upload a reference clip, type what you want it to say, and get speech in that voice.
audio generation
film production
longcat
text to speech
voice cloning
voiceover
0
55
Nodes & Models
LoadAudio
NormalizeAudioLoudness
LongCatVoiceCloneTTS
SaveAudioMP3
Voice cloning text-to-speech with LongCat AudioDiT 3.5B.
Upload a short reference clip of someone speaking. Paste a transcript of what they say in that clip. Write the new text you want spoken. The model generates audio of your new text in the reference voice.
No training. No fine-tuning. Works with any clean voice sample.
How do you clone a voice with LongCat AudioDiT?
Upload a clean reference audio clip of the voice you want to clone. Paste the exact transcript of what's said in that clip into the prompt text field. Write your new line in the text field. Hit run. The model generates speech of your new text in the reference voice.
Reference audio A short clean recording of someone speaking. Want the cleanest output? Use clear speech with no music, no background noise, and consistent volume. Around 5 to 15 seconds is the sweet spot.
Reference transcript Write down exactly what's said in your reference audio, word for word. The model uses this to map sound to text. Mismatched transcripts hurt quality more than people expect.
New text What you want the cloned voice to say. Keep punctuation natural since the model uses it for pacing. Want a pause? Use a comma or period.
Steps Default is 25. Want faster results? Drop to 15 to 20. Want cleaner audio with fewer artifacts? Push to 30 to 40. Past 40, gains flatten out.
Guidance strength Default is 4. Want output that sticks closer to the reference voice? Push to 5 or 6. Voice sounding stiff or over-styled? Drop to 2 or 3.
Guidance method Default is "apg". Stable across most voices. Switch methods if your output sounds flat or develops weird artifacts.
Seed Randomized by default. Got an output you like? Lock the seed to compare other settings against the same baseline.
What is LongCat AudioDiT good for?
Voice cloning workflows where you need TTS in a specific voice without training a model. Good for voiceover drafts, audiobook narration, character voices for animation or games, dubbing in someone's voice, and long-form content where vocal consistency across hundreds of lines matters more than emotional range.
Best on clean source material. Reference audio quality sets the ceiling. Studio-quality input gives you studio-quality output. Phone recordings give you phone recordings back.
Use it to test voiceover scripts before booking talent. Use it for character voices that need to sound identical across a hundred lines. Use it for accessibility work where you're generating spoken content at scale.
Doing one-off TTS where the voice doesn't matter? A standard TTS workflow is faster. Need real emotional range and acting? Human voice talent still wins.
FAQ
How long should my reference audio be for LongCat AudioDiT? Around 5 to 15 seconds works well. Too short and the model has nothing to learn from. Too long and quality plateaus while runtime grows. One or two clean sentences spoken naturally beats a minute of varied content.
Why does my LongCat AudioDiT output sound robotic or distorted? Three usual causes. Your reference audio has noise or background music. Your transcript doesn't match what's said in the audio. Or your guidance strength is too high. Clean the input, fix the transcript, drop guidance to 3 or 4.
Can LongCat AudioDiT clone any voice? It handles most voices that speak clearly. Heavy accents, whispered speech, singing, or voices with strong vocal effects can confuse it. Stick to natural conversational speech for the best results.
Does the transcript need to be perfect? Yes. Word-for-word accuracy matters. The model aligns sound to text, so a sloppy transcript means sloppy cloning. Spend the extra minute getting it right.
How to run LongCat AudioDiT online? You can run LongCat AudioDiT online through Floyo. No installation, no setup. Open the workflow in your browser, upload your inputs, and hit run. Free to try.
Read more

