floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

Kling O3 Video to Video — Standard Reference

25

Generates in about -- secs

Nodes & Models

KlingO3StandardVideoToVideoReference_floyo
VideoToFrames
WorkflowGraphics
LoadVideo
VHS_VideoCombine
VHS_VideoCombine

Upload a source video and a reference image, describe the scene, and Kling O3 generates a new clip where the subject matches your reference. The source video provides scene and motion context. The reference image controls who or what appears in the output.

This is the reference mode. Use it when the subject's appearance needs to match something specific.

How do you use Kling O3 video to video with a reference image?

Upload a source clip and a reference image, write a prompt describing the action and scene, and Kling O3 generates a video where the subject matches your reference. Duration, aspect ratio, and shot type are all configurable.

Prompt Describe the action and setting. The example in the workflow: "man walking in the streets of NYC." The reference image handles the subject's appearance, so your prompt doesn't need to describe them — focus on what they're doing and where.

Reference image (image_1 / element_1_frontal_image) The core input. Upload a clear image of the character or subject you want in the video. The frontal image slot is for face or full-body reference. The closer the reference matches what you're describing, the more consistent the output.

Duration 5 seconds by default. Shorter clips give the model less to manage and tend to hold subject consistency better. Increase if the action needs more time to play out.

Shot type "Customize" by default, letting your prompt steer the framing. Switch to a specific option to lock in a camera angle regardless of the prompt.

Aspect ratio "Auto" by default, which matches the source video. Override it if you need a specific output format (16:9, 9:16, 1:1).

Keep audio On by default. The original audio track carries into the output. Turn it off if the audio doesn't match the new scene.

What is Kling O3 reference-controlled video editing good for?

Reference mode is for when you need a specific person or character to appear in generated video. The edit mode gives you action control. Reference mode adds appearance control on top of that — the subject in the output matches whoever you upload.

Good scenarios: you have a character reference (portrait, product shot, concept art) and need them in a scene with natural motion. You're producing character-driven content across multiple clips and need face consistency. You want to match a real person's likeness to AI-generated footage.

Not the right tool for pure style edits or action changes where subject appearance doesn't matter. For those, the standard edit workflow is faster and has one fewer input to manage.

FAQ

What's the difference between Kling O3 edit mode and reference mode? Edit mode rewrites the action in your clip from a text prompt alone. Reference mode does the same but also uses an image you provide to anchor what the subject looks like in the output. Same base model, one extra input, much more control over the subject.

What makes a good reference image for Kling O3? A well-lit, front-facing image with the subject clearly visible against a clean background. Avoid partial crops or heavily stylized images. The cleaner the reference, the more reliably the model carries the appearance into the video.

How long can Kling O3 reference videos be? Default is 5 seconds. Shorter clips hold subject consistency better. Increase duration if the action genuinely needs it, but expect more variance over longer clips.

Does Kling O3 reference mode preserve the original audio? Yes. Keep_audio is on by default and the source audio passes into the output MP4. Turn it off if you're scoring the video separately.

Read more

N