floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

LTX 2 19B Fast for Image to Video

A workflow for ltx 2 image to video using distilled model

239

LTX‑2 19B Image‑to‑Video is an open‑source DiT‑based model that takes a single reference image plus a short prompt and turns it into a 5–20 second video clip with synchronized audio. It preserves the composition, lighting, and subject of the input image while adding natural camera and scene motion at 480p, 720p, or 1080p.​

Overview

LTX‑2 is a 19‑billion‑parameter audio‑video foundation model that runs fully locally or via open APIs, covering text‑to‑video, image‑to‑video, and video‑to‑video in one architecture. The 19B image‑to‑video pipeline treats your still as a keyframe, then predicts frames and matching sound so the camera can dolly, pan, or orbit while elements like hair, clothing, and background move coherently over time.​​

Why use the 19B open‑source model

  • High fidelity & temporal stability: The large DiT backbone produces detailed, low‑flicker motion that holds up across 5–20 second shots, unlike many lighter I2V models.​

  • Synchronized audio in one pass: It generates video and sound together, so ambience and simple SFX match the visual motion without a separate audio model.​

  • Open and extensible: Weights, trainer, and LoRA hooks are available, letting you fine‑tune styles or camera behaviors and integrate directly into node‑based tools like ComfyUI.​

Typical image‑to‑video usage

  • Provide a clean input image (JPG/PNG/WebP) that already matches your desired framing; the model is designed to “preserve input composition” while adding motion.​

  • Add a concise motion prompt, for example describing camera behavior (“slow dolly‑in”, “handheld pan left”), subject motion (“hair and coat moving in wind”), and mood.​

  • Choose duration (5–20 s) and resolution (480p/720p/1080p); on a mid‑range GPU users report 720p ~8 s clips in a few minutes with FP8/distilled configs.​

Use cases

  • Turning strong stills (AI art, product renders, portraits, environments) into cinematic motion shots with camera moves and subtle environmental animation.​

  • Building open‑source, fully local pipelines where both privacy and control over LoRAs/camera control adapters are important.​

  • Generating B‑roll and hero shots for edits: the model’s composition‑preserving behavior makes it well‑suited for animating designed keyframes rather than re‑framing them.

Read more

N
Generates in about -- secs

Nodes & Models

PrimitiveInt
PrimitiveFloat
LTXVAudioVAELoader
ltx-2-19b-distilled.safetensors
LatentUpscaleModelLoader
ltx-2-spatial-upscaler-x2-1.0.safetensors
LoadImage
MarkdownNote
PrimitiveStringMultiline
CheckpointLoaderSimple
ltx-2-19b-distilled.safetensors
EmptyImage
KSamplerSelect
RandomNoise
ManualSigmas
WorkflowGraphics
LoraLoaderModelOnly
your_camera_lora.safetensors
ImageScaleBy
LTXVEmptyLatentAudio
CLIPTextEncode
GetImageSize
LTXVConditioning
EmptyLTXVLatentVideo
CFGGuider
LTXVPreprocess
LTXVImgToVideoInplace
LTXVConcatAVLatent
SamplerCustomAdvanced
LTXVSeparateAVLatent
LTXVLatentUpsampler
LTXVAudioVAEDecode
CreateVideo
SaveVideo
LTXVGemmaCLIPModelLoader
gemma-3-12b-it-qat-q4_0-unquantized/model-00001-of-00005.safetensors
ltx-2-19b-distilled.safetensors
LTXVGemmaEnhancePrompt
CM_FloatToInt
ImpactExecutionOrderController

LTX‑2 19B Image‑to‑Video is an open‑source DiT‑based model that takes a single reference image plus a short prompt and turns it into a 5–20 second video clip with synchronized audio. It preserves the composition, lighting, and subject of the input image while adding natural camera and scene motion at 480p, 720p, or 1080p.​

Overview

LTX‑2 is a 19‑billion‑parameter audio‑video foundation model that runs fully locally or via open APIs, covering text‑to‑video, image‑to‑video, and video‑to‑video in one architecture. The 19B image‑to‑video pipeline treats your still as a keyframe, then predicts frames and matching sound so the camera can dolly, pan, or orbit while elements like hair, clothing, and background move coherently over time.​​

Why use the 19B open‑source model

  • High fidelity & temporal stability: The large DiT backbone produces detailed, low‑flicker motion that holds up across 5–20 second shots, unlike many lighter I2V models.​

  • Synchronized audio in one pass: It generates video and sound together, so ambience and simple SFX match the visual motion without a separate audio model.​

  • Open and extensible: Weights, trainer, and LoRA hooks are available, letting you fine‑tune styles or camera behaviors and integrate directly into node‑based tools like ComfyUI.​

Typical image‑to‑video usage

  • Provide a clean input image (JPG/PNG/WebP) that already matches your desired framing; the model is designed to “preserve input composition” while adding motion.​

  • Add a concise motion prompt, for example describing camera behavior (“slow dolly‑in”, “handheld pan left”), subject motion (“hair and coat moving in wind”), and mood.​

  • Choose duration (5–20 s) and resolution (480p/720p/1080p); on a mid‑range GPU users report 720p ~8 s clips in a few minutes with FP8/distilled configs.​

Use cases

  • Turning strong stills (AI art, product renders, portraits, environments) into cinematic motion shots with camera moves and subtle environmental animation.​

  • Building open‑source, fully local pipelines where both privacy and control over LoRAs/camera control adapters are important.​

  • Generating B‑roll and hero shots for edits: the model’s composition‑preserving behavior makes it well‑suited for animating designed keyframes rather than re‑framing them.

Read more

N