API

Pricing

Workflows

API

Pricing

LTX 2.3 - Audio to Video

Generate video from an audio track and a reference image with LTX 2.3

animation

image to video

lipsync

ltx 2

video generation

258

_MConverter.eu_gDLCWwrK7OzladWWROLyQ_gUGjX7Iw_1774943173370.webp

Generates in about 2 mins 43 secs

nikhil07

Nodes & Models

Floyo Partner Nodes

LTX23AudioToVideo_floyo

Ver Private

Comm Use

VideoToFrames

ComfyUI Official

LoadAudio

LoadImage

WorkflowGraphics

ComfyUI-VideoHelperSuite

VHS_VideoCombine

ComfyUI-S3-IO

VHS_VideoCombine

LTX 2.3 Audio to Video takes an audio file and a reference image and generates a video where the visual content moves in response to the sound.

Upload an audio track and a still image. LTX 2.3 animates the image in sync with the audio, using the rhythm, pace, and energy of the sound to drive the motion. Add a short prompt to describe the subject or scene. Adjust guidance scale to control how closely the output follows your prompt versus the audio signal.

Audio in. Animated video out.

How do you use LTX 2.3 audio to video?

Upload an audio file and a reference image. Write a short prompt describing the subject. LTX 2.3 generates a video that animates the image in sync with the audio track. Use guidance scale to balance prompt influence against audio-driven motion.

Audio The sound that drives the animation. LTX 2.3 reads the rhythm, energy, and pacing of the track to generate motion that responds to it. Music with clear beats, speech with natural cadence, and ambient audio with distinct texture all give the model usable motion cues. The cleaner and more structured the audio, the more coherent the motion sync.

Reference image The visual starting point. LTX 2.3 animates this image in response to the audio. A portrait, a person, a scene, or an object all work. The image sets the appearance and character of the video. The audio sets how it moves.

Prompt Describe the subject and what they are doing. Keep it short and visual: "Elderly man teaching mathematics", "Singer performing on stage", "Dancer moving to music." The prompt anchors the model to a subject and activity. It does not need to describe the motion in detail. That comes from the audio.

Guidance scale Controls how closely the output follows your prompt versus the audio. Higher values push the output toward the prompt description. Lower values give the audio more influence over what happens visually. Default is 5. Try 3 to 4 for more audio-reactive motion, 6 to 7 if the subject needs to stay more recognizably on-prompt.

What is LTX 2.3 audio to video good for?

Animating portraits and scenes to music or speech, generating audio-reactive visual content, and creating talking head style video from a still image and a voiceover track.

The most direct use is animating a still image of a person against a speech or music track. Upload a portrait, add the audio, describe the subject, and get a short clip where the person appears to move in time with the sound.

Music visualizers, talking head animations, and scene animations tied to voiceover or ambient audio are all within range. The model handles natural motion cues from the audio better than highly synthetic or heavily processed sound.

For precise lip-sync where phoneme accuracy matters, a dedicated lip-sync workflow will give you more control. LTX 2.3 Audio to Video generates audio-reactive motion at a broader level: rhythm, energy, and expressiveness, rather than frame-accurate mouth matching.

Discover more workflows

You might like these too.

Sync Lipsync - Audio Driven Lip Sync for Video

nikhil07

292

animation

image to video

ltx 2

vfx

video generation

Sync any audio track to a video face with Sync Lipsync.

Sync Lipsync - Audio Driven Lip Sync for Video

Sync any audio track to a video face with Sync Lipsync.

nikhil07

290

animation

image to video

lipsync

vid2vid

video generation

wan

Wan 2.1 InfiniteTalk talking video from audio and a reference clip

Wan 2.1 InfiniteTalk

Wan 2.1 InfiniteTalk talking video from audio and a reference clip

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Wan 2.1 FusionX: Cinematic Image to Video

floyoofficial

4.6k

FusionX

Image to Video

Video Generation

Wan

Created by @vrgamedevgirl on Civitai, please support the original creator!

Wan 2.1 FusionX: Cinematic Image to Video

Created by @vrgamedevgirl on Civitai, please support the original creator!

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images