floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

SAM3 for Video Masking using Text

Create a video masking using SAM3 and Text only.

38

SAM 3 lets you create video masks just by describing what you want to segment (for example “red car”, “main person”, “blue backpack”), then tracks all matching instances across frames.

Overview

SAM 3 is a promptable concept segmentation model: you give it short noun‑phrase text prompts and it detects, segments, and tracks every instance of that concept in a video. The video predictor streams through frames with a memory mechanism, so the same objects keep consistent IDs over time, even with occlusion or re‑appearance.

How text‑based video masking works

  • You call a SAM 3 video predictor with a video source and a list of text prompts, for example ["person", "bicycle"].

  • For each frame, SAM 3 returns:

    • Segmentation masks for all instances matching each text concept.

    • Tracking IDs so instances stay linked across frames.

  • You can then convert these masks into binary alpha mattes or colored overlays and export them as per‑frame masks or a mask video for compositing.

Why this is useful for video masking

  • Open‑vocabulary: You don’t need a fixed label set—any short phrase like “yellow school bus” or “striped umbrella” can be used as a concept.

  • All instances, not one: Unlike earlier models, SAM 3 segments every object that matches your text, not just a single instance.

  • Less manual work: You avoid drawing boxes on every object or every frame; text prompts plus occasional point refinements are usually enough.​

Typical use cases

  • Creating masks for all people, cars, or specific props in a scene to apply localized effects or color grading.

  • Automatically masking branded items (“logos”, “bottles”) for protection, replacement, or analytics.

  • Pre‑masking objects for downstream tools (virtual try‑on, character edits, background swaps) without manual rotoscoping.

Read more

N
Generates in about 24 secs

Nodes & Models

VHS_LoadVideo
VHS_VideoInfo
VHS_VideoCombine
VHS_LoadVideo
VHS_VideoInfo
VHS_VideoCombine
WorkflowGraphics
ImageBlend
MaskPreview
MaskToImage
LoadSAM3Model
SAM3VideoSegmentation
SAM3Propagate
SAM3VideoOutput

SAM 3 lets you create video masks just by describing what you want to segment (for example “red car”, “main person”, “blue backpack”), then tracks all matching instances across frames.

Overview

SAM 3 is a promptable concept segmentation model: you give it short noun‑phrase text prompts and it detects, segments, and tracks every instance of that concept in a video. The video predictor streams through frames with a memory mechanism, so the same objects keep consistent IDs over time, even with occlusion or re‑appearance.

How text‑based video masking works

  • You call a SAM 3 video predictor with a video source and a list of text prompts, for example ["person", "bicycle"].

  • For each frame, SAM 3 returns:

    • Segmentation masks for all instances matching each text concept.

    • Tracking IDs so instances stay linked across frames.

  • You can then convert these masks into binary alpha mattes or colored overlays and export them as per‑frame masks or a mask video for compositing.

Why this is useful for video masking

  • Open‑vocabulary: You don’t need a fixed label set—any short phrase like “yellow school bus” or “striped umbrella” can be used as a concept.

  • All instances, not one: Unlike earlier models, SAM 3 segments every object that matches your text, not just a single instance.

  • Less manual work: You avoid drawing boxes on every object or every frame; text prompts plus occasional point refinements are usually enough.​

Typical use cases

  • Creating masks for all people, cars, or specific props in a scene to apply localized effects or color grading.

  • Automatically masking branded items (“logos”, “bottles”) for protection, replacement, or analytics.

  • Pre‑masking objects for downstream tools (virtual try‑on, character edits, background swaps) without manual rotoscoping.

Read more

N