API

Pricing

Workflows

API

Pricing

SAM3 for Video Masking using Text

Create a video masking using SAM3 and Text only.

SAM3

Video2Video

Video Masking

315

Generates in about 28 secs

floyoofficial

Nodes & Models

ComfyUI-VideoHelperSuite

VHS_LoadVideo

VHS_VideoInfo

VHS_VideoCombine

ComfyUI-S3-IO

VHS_LoadVideo

VHS_VideoInfo

VHS_VideoCombine

ComfyUI Official

WorkflowGraphics

ImageBlend

MaskPreview

MaskToImage

ComfyUI-SAM3

LoadSAM3Model

SAM3VideoSegmentation

SAM3Propagate

SAM3VideoOutput

SAM 3 lets you create video masks just by describing what you want to segment (for example “red car”, “main person”, “blue backpack”), then tracks all matching instances across frames.

Overview

SAM 3 is a promptable concept segmentation model: you give it short noun‑phrase text prompts and it detects, segments, and tracks every instance of that concept in a video. The video predictor streams through frames with a memory mechanism, so the same objects keep consistent IDs over time, even with occlusion or re‑appearance.

How text‑based video masking works

You call a SAM 3 video predictor with a video source and a list of text prompts, for example ["person", "bicycle"].
For each frame, SAM 3 returns:
- Segmentation masks for all instances matching each text concept.
- Tracking IDs so instances stay linked across frames.
You can then convert these masks into binary alpha mattes or colored overlays and export them as per‑frame masks or a mask video for compositing.

Why this is useful for video masking

Open‑vocabulary: You don’t need a fixed label set—any short phrase like “yellow school bus” or “striped umbrella” can be used as a concept.
All instances, not one: Unlike earlier models, SAM 3 segments every object that matches your text, not just a single instance.
Less manual work: You avoid drawing boxes on every object or every frame; text prompts plus occasional point refinements are usually enough.

Typical use cases

Creating masks for all people, cars, or specific props in a scene to apply localized effects or color grading.
Automatically masking branded items (“logos”, “bottles”) for protection, replacement, or analytics.
Pre‑masking objects for downstream tools (virtual try‑on, character edits, background swaps) without manual rotoscoping.

Discover more workflows

You might like these too.

floyoofficial

353

SAM3

Video2Video

Video Masking

Create a video masking using SAM3 and Points only.

SAM3 for Video Masking using Points

Create a video masking using SAM3 and Points only.

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Wan 2.1 FusionX: Cinematic Image to Video

floyoofficial

4.6k

FusionX

Image to Video

Video Generation

Wan

Created by @vrgamedevgirl on Civitai, please support the original creator!

Wan 2.1 FusionX: Cinematic Image to Video

Created by @vrgamedevgirl on Civitai, please support the original creator!

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)