LatentSync - Lip Sync Video from Audio

Make anyone in a video match your audio with LatentSync

animation

image to video

LatentSync

Lipsync

vfx

video generation

140

Generates in about -- secs

nikhil07

Nodes & Models

Floyo Partner Nodes

LatentSync_floyo

VideoToFrames

ComfyUI Official

LoadVideo

WorkflowGraphics

LoadAudio

ComfyUI-VideoHelperSuite

VHS_VideoCombine

ComfyUI-S3-IO

VHS_VideoCombine

LatentSync syncs the mouth movements in a video to a new audio track.

Upload a video with a visible face and an audio file with speech or song. LatentSync rewrites the mouth in the video to match what is being said in the audio. Pick how the video loops if it is shorter than the audio, adjust guidance if needed, and run.

Two inputs. One lip-synced video out.

How do you use LatentSync?

Upload your video and audio. Pick a loop mode for when the video is shorter than the audio. Hit run. LatentSync rewrites the mouth movements to match your audio and outputs the finished clip.

Here is the full setup, step by step:

Step 1: Upload your video Pick a video where a face is clearly visible and facing the camera. The cleaner the face shot, the better the lip sync. Side angles, heavy shadows, or hands covering the mouth will reduce accuracy.

Step 2: Upload your audio Add the audio you want the person to appear to say. This can be a speech recording, a voiceover, or a song. You can also set a start time and duration to use only a specific section of a longer audio file.

Step 3: Choose a loop mode If your video is shorter than your audio, LatentSync needs to know what to do when it runs out of footage. Two options:

pingpong plays the video forward then backward, bouncing back and forth for the duration of the audio. Good for natural-looking loops.

loop repeats the video from the beginning each time it ends.

Pick pingpong for most cases. It looks more natural than a hard loop cut.

Step 4: Set guidance scale This controls how closely the model follows the audio signal when generating the mouth movements. The default is 1. Raise it slightly (try 1.5 to 2) if the lip sync looks loose. Lower values give softer, subtler movement.

Step 5: Run Hit run. LatentSync processes the video and audio together and outputs a new clip with the mouth movements rewritten to match your audio.

What is LatentSync good for?

Dubbing footage, replacing dialogue in a clip, making a person appear to sing or speak new words, and creating lip-synced content from existing video without re-shooting.

It works on any video where the face is visible and facing forward. The most common use is replacing what someone is saying. Upload the original footage, swap in a new audio track, and the output looks like the person is saying the new words.

Singing and vocal performance syncing also works well. Upload a music track and a video of a performer, and LatentSync rewrites the mouth to match the song.

Keep the face still and well-lit in the source video. Fast head movements, cuts between angles, or dark footage will reduce the accuracy of the sync.