floyo logo
Workflows
Pricing
floyo logo
Workflows
Pricing

VOID Video Inpainting + SAM3 Text Masking

Remove objects from video with VOID's two-pass model. Type what to erase, SAM3 builds the mask, then VOID fills the holes coherently across every frame.

472

Generates in about 2 mins 54 secs

Nodes & Models

MarkdownNote
INTConstant
VHS_LoadVideo
VHS_VideoCombine
VHS_LoadVideo
VHS_VideoCombine
VHS_LoadVideo
VHS_VideoCombine

Video object removal that erases things across time, not frame by frame.

Upload a clip, type what you want gone ("person in blue jacket"), and SAM3 builds the mask automatically. Then VOID's two-pass video model fills the holes with coherent motion, lighting, and even the shadows the object was casting.

No mask drawing. Defaults already work. Pick your video, name the thing, write what should be there instead.

How do you remove objects from a video with VOID?

Load your video, write a SAM3 prompt naming what to erase ("person in blue jacket"), then write a positive prompt describing what the empty space should look like ("empty sidewalk, daylight"). VOID runs two passes: Pass 1 fills the hole, Pass 2 stabilizes it across frames. No manual masking required.

SAM3 object prompt This is the "what to remove" field. Use a short referring phrase like "red cup on table" or "person in blue jacket". Concrete and specific wins. Vague prompts produce vague masks, and the rest of the pipeline cannot recover from a bad mask.

Positive prompt (inpaint fill) Describe the result, not "remove X". Write what the scene looks like after the object is gone. "Empty kitchen counter, daylight, tiles visible" beats "remove the cup". The model is filling a hole, so tell it what should be there.

Negative prompt Leave it blank unless you see repeating defects. If outputs come back with watermarks, blur, or extra limbs, add those terms here. Most clips need nothing in this field.

Skip Pass 2 toggle Default is off, so both passes run. Pass 1 fills the masked region. Pass 2 cleans up temporal jitter so the fill stops shimmering between frames. Turn Pass 2 off for faster previews on short, simple clips. Keep it on for longer cuts or textured backgrounds where flickering shows.

Resolution (672 x 384 default) Tuned for the VOID model. Push higher if your source has fine detail, but generation time climbs fast on video. Keep the aspect ratio close to your input or you will get cropping.

Steps and CFG (30 steps, CFG 6) Good defaults for both passes. Drop steps to 20 for faster iteration. Raise CFG to 7 or 8 if the fill drifts away from your prompt. Play with one variable at a time.

What is VOID video inpainting good for?

VOID handles object removal where you need coherent motion, lighting, and causal cues, not frame-by-frame patching. Use it to delete people, vehicles, props, or watermarks from clips and have the background behave like they were never there. It is a single-purpose tool: removal, not general editing.

Useful for VFX cleanup (rigging, crew in the shot, microphones, signs), film production continuity fixes, accidental brand visibility, and any clip where a person or object needs to be gone without flickering edges or shifting backgrounds.

The model goes further than naive erase. Occluded pixels, including shadows the object cast and things it was blocking, fill in as if the object was never there. Lighting and seams stay believable.

When not to use it: chaotic motion, ambiguous masks where the target blends into the background, or objects that take up most of the frame. Prompting cannot fix a bad SAM3 mask. If the segmentation is wrong, fix that first.

FAQ

What is the difference between Pass 1 and Pass 2 in VOID? Pass 1 fills the masked region and is the main generation step. Pass 2 refines temporal stability so the fill stops flickering between frames. On short, simple clips you can skip Pass 2 to save time. On longer cuts or textured backgrounds, Pass 2 is the difference between watchable and unusable output.

Do I need to draw the mask myself for VOID video inpainting? No. SAM3 builds the mask from your text prompt. Type what you want gone, like "person in blue jacket" or "red cup on table", and SAM3 segments it across every frame. You only draw a mask manually if SAM3 cannot find the object or you need a narrow region.

What resolution does VOID work at? This workflow defaults to 672 by 384, which the VOID model is tuned for. You can push higher if your source has fine detail, but video generation time climbs fast. Keep the aspect ratio close to your input or you will get cropping at the edges.

Why does my VOID output have flickering or jitter? Usually a Pass 2 issue. Make sure "Skip Pass 2" is off. If it still flickers, the SAM3 mask is probably unstable between frames, so try a more specific object prompt. Chaotic motion and targets that nearly leave the frame are the hardest cases.

Read more

N