Sync Lipsync - Audio Driven Lip Sync for Video
Sync any audio track to a video face with Sync Lipsync.
animation
image to video
ltx 2
vfx
video generation
1
55
Nodes & Models
SyncLipsync_floyo
VideoToFrames
WorkflowGraphics
LoadAudio
LoadVideo
VHS_VideoCombine
VHS_VideoCombine
Sync Lipsync maps an audio track onto the mouth and face of a person in a video, generating frame-accurate lip sync that matches the speech in the audio.
Upload a video with a visible face and an audio track. Sync Lipsync processes both and outputs a new video where the subject's mouth movements match the audio. Choose the model tier, sync mode, emotion preset, and temperature to dial in the performance. Optional settings handle multi-speaker detection and occlusion.
Your video and audio in. A lip-synced clip out.
How do you use Sync Lipsync?
Upload a video with a visible face and an audio track. Pick the model (lipsync-2-pro for best quality), set the sync mode and emotion, and run. Sync Lipsync generates a new video with mouth movements that match the audio frame by frame.
Video The clip that contains the face to animate. A clear, front-facing shot with the subject's mouth visible gives the cleanest results. Heavy occlusion (hands, objects, or other faces covering the mouth) reduces accuracy. The video sets the head, lighting, and appearance. The audio drives the mouth.
Audio The speech track to sync against. Clean, single-speaker audio transfers most accurately. Background noise or overlapping voices are harder for the model to process.
Model Two options available. lipsync-2-pro is the higher-quality tier with sharper phoneme accuracy and better face preservation. Use it for any final-output work. A standard tier is available if you need faster turnaround on drafts or tests.
Sync mode Controls how the model handles the timing relationship between audio and video. bounce is the default and works well for most speech-driven content. Adjust if your output shows timing drift or frame misalignment.
Emotion Optional preset that shapes the expressiveness of the generated lip sync. none is the default and keeps the output neutral. Options like happy, serious, or surprise push the facial performance toward a specific register. Use these when the source video has a flat expression and you want the output to carry more energy.
Temperature Controls variation in the generated output. Default is 0.5. Higher values introduce more expressiveness and slight variation between runs. Lower values produce more consistent, restrained output. Start at 0.5 and adjust based on whether your output reads as too stiff or too exaggerated.
Active speaker auto detect Toggle on when your video contains multiple people. The model identifies who is speaking in the audio and applies lip sync to that person only.
Occlusion detection Toggle on if objects, hands, or other elements partially cover the subject's mouth. Helps the model handle frames where the mouth is not fully visible.
What is Sync Lipsync good for?
Dubbing, voiceover replacement, multilingual content, talking head video from an existing clip, and any workflow where the audio and video mouth movements need to match.
The most common use is dubbing: take a video in one language, feed in a translated voiceover, and get a version where the subject appears to speak the new audio. No manual animation or keyframing needed.
Talking head content where you want to swap or replace the audio is another strong use. Upload the original video, provide the new audio track, and the model generates replacement lip sync while keeping the rest of the face and background unchanged.
Results are strongest on front-facing, well-lit footage with a single speaker and clean audio. Profile shots, heavy motion blur, or fast head turns reduce accuracy. For multi-person scenes, use active speaker detection to keep the sync on the right subject.
Read more


