ThinkDiffusion

Product

Pricing

Enterprise

Docs

ThinkDiffusion

Wan 2.6 Reference to Video

VFX

Video2Video

Video Production

Wan2.6

5.4k

Key Inputs - Wan 2.6 R2V

Besides resolution and duration, these are the key inputs for controlling the output:

Reference Video: Upload a 5-second video of your subject. This is what the model learns from - capture multiple angles, expressions, and movement for best results.
Prompt: Describe the new scene you want your cloned subject placed into. Be specific about action, setting, and mood.
Duration: Choose between 5s or 10s output length.
Audio Sync: Native audio generates automatically with the video - lip-sync matches if your subject speaks in the output.
Multi-Subject: Toggle on to clone multiple characters from separate reference videos into the same scene.

Related Workflows - Ready-to-Run on Floyo

Wan 2.6 I2V - Generated an image you want to animate? Hand it off to I2V as your starting frame.
Wan 2.6 T2V - When you need video output but want to establish visual style first, generate reference images here, then use them to guide T2V aesthetic.

What It Does

Clone any person, animal, character, or object from a 5-second reference video - then use that subject in new video generations with consistent appearance, voice, and motion dynamics. Think of it as video-to-video character transfer with audio sync baked in.

The key difference from image-based reference tools is that video gives the model way more to work with. A few photos can only show so much. Five seconds of video captures how someone actually moves, their expressions shifting, maybe a full turn that shows every angle. That 360° information makes the cloning significantly more accurate.

Specifications

Input - 5-second reference video
Output Resolution - 1080p @ 24fps
Max Duration - 5s / 10s clips
Capabilities - 360° character cloning, voice replication, expression/motion learning
Audio - Native sync (music, SFX, human speech)
Multi-subject - Yes, supports multiple cloned characters in one generation
Access - API only (Run in Browser on Floyo) - no open weights yet

When to Use

Character consistency across multiple shots/scenes
Cloning a specific person or mascot for branded content
Dialogue scenes where you need lip-sync without post-production
Storyboarding with a consistent "actor" across your project

Community Feedback (Early Takes)

What's working: Multi-shot text adherence is solid. The R2V character consistency is genuinely the standout - better than multi-image reference approaches because video captures full 360° information plus motion/expression data.

R2V as a concept is genuinely useful and the video-reference approach makes sense technically. If character consistency is your main problem, this addresses it better than image-based alternatives. The audio sync is a nice bonus that saves post-production hassle.

If they drop weights, the calculus changes - community fine-tunes and local deployment would open up a lot. Until then, "watch this space".

izaldo

• 3 days ago

É manhã de primavera. As flores enfeitam os campos e cobrem o ar com seu perfume. Os animais pastam e passeiam entre os ramos macios do lugar. O barulho da cachoeira soa como um canto. As montanhas encobrem um horizonte de sonhos. Na suavidade de seus quinze anos, Tina levanta-se da cama antes do nascer do sol e sai ao portão. Ela nem se importa com orvalho caído sobre a grama do quintal, pois está sendo movida pelo amor. Aldo descobre-se, de repente, na curva da estrada. Ele vem sorrindo e trazendo algumas flores que foram colhidas no campo para serem oferecidas à sua amada. Tina o espera com um sorriso nos lábios e um brilho de amor no olhar. O seu coração bate de alegria quando seus olhos veem-no.

doctor

• 1 month ago

🎬 VIDEO GENERATION PROMPT (New Year Dance Song) Create a 1-minute vertical dance music video (9:16) based on an Arabic New Year song. Style & Mood: Festive, energetic, joyful, modern, colorful. Night celebration vibe with neon lights and confetti. Main Character: A confident young Egyptian woman, stylish outfit, dancing with happiness. Natural makeup, modern fashion, expressive smile. Scenes & Visual Flow (sync with music beats): Intro (0–5s): City at night, fireworks in the sky, glowing countdown numbers, neon lights. Camera slow zoom in. Verse (5–20s): Female character walking then dancing through a modern city street. Warm lights, soft slow motion mixed with normal speed. Pre-Chorus (20–30s): Close-up on the woman smiling, lights flashing with the beat. Energy building, camera movement becomes faster. Chorus + Drop (30–45s): High-energy dance scene. Friends appear dancing together. Confetti, sparkles, fireworks, light flashes synced to the beat. Final Chorus (45–55s): Group dancing, jumping, laughing. Camera spins, dynamic angles, fast cuts. Outro (55–60s): Fireworks explode forming glowing Arabic text: "سنة جديدة – سنة سعيدة" Fade out with sparkles. Visual Details: • Smooth cinematic lighting • Beat-synced transitions • Modern dance moves • Bright colors (gold, pink, purple, blue) • Clean, high-quality AI visuals Camera & Quality: Dynamic camera, smooth motion, shallow depth of field. Ultra HD, high detail, realistic faces, cinematic look.

EXTENSIONS

ComfyUI-VideoHelperSuite

Kosinkadink

The ComfyUI-VideoHelperSuite enhances video workflows within COMFYUI by providing nodes for loading videos and images, combining them into videos, and managing audio, all with customizable settings for frame rates and formats.

1481

2026-01-24

ComfyUI-Floyo-API

FloyoAI

ComfyUI-Floyo-API provides custom nodes for image and video generation, leveraging Flux models via a Floyo proxy, enhancing COMFYUI's capabilities without requiring a FAL API key.

2026-01-26

Generates in about 4 mins 23 secs

floyoofficial

Nodes & Models

Floyo API Nodes

AlibabaWan26ReferenceToVideo_floyo

VideoToFrames

ComfyUI Official

WorkflowGraphics

LoadVideo

ComfyUI-VideoHelperSuite

VHS_VideoCombine

Key Inputs - Wan 2.6 R2V

Besides resolution and duration, these are the key inputs for controlling the output:

Reference Video: Upload a 5-second video of your subject. This is what the model learns from - capture multiple angles, expressions, and movement for best results.
Prompt: Describe the new scene you want your cloned subject placed into. Be specific about action, setting, and mood.
Duration: Choose between 5s or 10s output length.
Audio Sync: Native audio generates automatically with the video - lip-sync matches if your subject speaks in the output.
Multi-Subject: Toggle on to clone multiple characters from separate reference videos into the same scene.