Segment Anything 2 for Creating Video Mask
Create a video mark frame by frame using Segment Anything 2
SAM2
Segment Anything 2
video2video
Video Mask
0
59
Segment Anything 2 (SAM 2) can generate a temporally consistent mask for an object across an entire video, starting from just a few clicks or a box on one frame.
Overview
SAM 2 is a promptable segmentation model for both images and videos: you give a point, box, or initial mask on a target object, and the model tracks and segments that object through the video using an internal memory mechanism. The memory encoder and mask decoder work frame‑by‑frame, using past predictions as context so the mask stays locked to the same object even as it moves, deforms, or is briefly occluded.
Why use SAM 2 for video masks
It can track objects over time with much less manual keyframing than classic rotoscoping.
It accepts flexible prompts (point, box, or mask), so you can start from whatever annotation is easiest in your tool.
Its memory bank makes masks more stable in real‑world footage (camera shake, motion blur, partial occlusions) than single‑frame segmentation models.
Typical mask‑creation flow
Provide a prompt for the object on one frame (commonly the first frame):
A click on the object,
A bounding box around it, or
A rough mask from another tool.
Run the video predictor to propagate that prompt across all frames, which yields logits or masks per frame plus stable object IDs.
Threshold the mask logits to binary masks and export them (for example as per‑frame PNG alpha or as a mask video) for compositing, background removal, or targeted effects.
That gives you a clean, frame‑aligned video mask that can be used downstream for things like background replacement, localized stylization, or feeding into other I2V/V2V models.
Read more
Nodes & Models
Int Input [Dream]
DownloadAndLoadSAM2Model
sam2.1_hiera_large.safetensors
Sam2Segmentation
DownloadAndLoadSAM2Model
sam2.1_hiera_large.safetensors
Sam2Segmentation
WorkflowGraphics
PointsEditor
ResizeMask
MaskToImage
VHS_LoadVideo
VHS_VideoInfo
VHS_VideoCombine
VHS_LoadVideo
VHS_VideoInfo
VHS_VideoCombine
Image Blank
ImageResize+
ImageResize+
easy imageSwitch
Segment Anything 2 (SAM 2) can generate a temporally consistent mask for an object across an entire video, starting from just a few clicks or a box on one frame.
Overview
SAM 2 is a promptable segmentation model for both images and videos: you give a point, box, or initial mask on a target object, and the model tracks and segments that object through the video using an internal memory mechanism. The memory encoder and mask decoder work frame‑by‑frame, using past predictions as context so the mask stays locked to the same object even as it moves, deforms, or is briefly occluded.
Why use SAM 2 for video masks
It can track objects over time with much less manual keyframing than classic rotoscoping.
It accepts flexible prompts (point, box, or mask), so you can start from whatever annotation is easiest in your tool.
Its memory bank makes masks more stable in real‑world footage (camera shake, motion blur, partial occlusions) than single‑frame segmentation models.
Typical mask‑creation flow
Provide a prompt for the object on one frame (commonly the first frame):
A click on the object,
A bounding box around it, or
A rough mask from another tool.
Run the video predictor to propagate that prompt across all frames, which yields logits or masks per frame plus stable object IDs.
Threshold the mask logits to binary masks and export them (for example as per‑frame PNG alpha or as a mask video) for compositing, background removal, or targeted effects.
That gives you a clean, frame‑aligned video mask that can be used downstream for things like background replacement, localized stylization, or feeding into other I2V/V2V models.
Read more


