floyo logo
Powered by
ThinkDiffusion
⚡️Nano Banana 2 ⚡️ just landed. Start creating now.
floyo logo
Powered by
ThinkDiffusion
⚡️Nano Banana 2 ⚡️ just landed. Start creating now.

Wan2.1 FusionX and MultiTalk - Image to Video

Turn any portrait - artwork, photos, or digital characters - into speaking, expressive videos that sync perfectly with audio input. MultiTalk handles lip movements, facial expressions, and body motion automatically.

1.5k

Generates in about 1 min 19 secs

Nodes & Models

WanVideoBlockSwap
WanVideoTorchCompileSettings
LoadWanVideoT5TextEncoder
umt5-xxl-enc-bf16.safetensors
WanVideoVAELoader
Wan2_1_VAE_bf16.safetensors
WanVideoLoraSelect
detailz-wan.safetensors
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
WanVideoModelLoader
Wan2.1_14B_FusionX.safetensors
WanVideoBlockSwap
WanVideoTorchCompileSettings
WanVideoTeaCache
WanVideoEnhanceAVideo
DownloadAndLoadWav2VecModel
LoadWanVideoT5TextEncoder
umt5-xxl-enc-bf16.safetensors
WanVideoVAELoader
Wan2_1_VAE_bf16.safetensors
WanVideoLoraSelect
detailz-wan.safetensors
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
MultiTalkModelLoader
Wan2_1-InfiniTetalk-Single_fp16.safetensors
WanVideoTextEncodeSingle
WanVideoApplyNAG
WanVideoModelLoader
Wan2.1_14B_FusionX.safetensors
WanVideoClipVisionEncode
MultiTalkWav2VecEmbeds
WanVideoImageToVideoMultiTalk
WanVideoSampler
WanVideoDecode
Label (rgthree)
CLIPVisionLoader
clip_vision_h.safetensors
LoadAudio
LoadImage
DownloadAndLoadWav2VecModel
MultiTalkModelLoader
Wan2_1-InfiniTetalk-Single_fp16.safetensors
MultiTalkWav2VecEmbeds
WanVideoImageToVideoMultiTalk
DownloadAndLoadWav2VecModel
AudioCrop
AudioSeparation
ImageResizeKJv2
ImageResizeKJv2
AudioSeparation
VHS_VideoCombine

Turn any portrait - artwork, photos, or digital characters - into speaking, expressive videos that sync perfectly with audio input.

MultiTalk handles lip movements, facial expressions, and body motion automatically.

MultiTalk is an open-source AI framework that converts static images into realistic talking videos using audio input. Built by MeiGen AI, it accurately syncs lip movements and facial expressions to speech or singing, supporting both single and multi-person scenes.

With support for single or multi-person scenes, text prompts for emotion and behavior control, and compatibility with real or stylized characters, MultiTalk offers incredible creative flexibility. Integrated into ComfyUI and optimized for fast performance, it’s ideal for digital artists, content creators, educators, and developers who want to bring portraits, avatars, or original characters to life in seconds.

Key Inputs

Load Image: Upload an image of a single person or multiple people

Load Audio: Upload audio clip of either speech or singing

Prompt: Describe the motion and speech

Read more

N