LTX 2 19B Fast for Image to Video
A workflow for ltx 2 image to video using distilled model
Animation
Filmography
Image2Video
LTX 2
Open Source
0
239
LTX‑2 19B Image‑to‑Video is an open‑source DiT‑based model that takes a single reference image plus a short prompt and turns it into a 5–20 second video clip with synchronized audio. It preserves the composition, lighting, and subject of the input image while adding natural camera and scene motion at 480p, 720p, or 1080p.
Overview
LTX‑2 is a 19‑billion‑parameter audio‑video foundation model that runs fully locally or via open APIs, covering text‑to‑video, image‑to‑video, and video‑to‑video in one architecture. The 19B image‑to‑video pipeline treats your still as a keyframe, then predicts frames and matching sound so the camera can dolly, pan, or orbit while elements like hair, clothing, and background move coherently over time.
Why use the 19B open‑source model
High fidelity & temporal stability: The large DiT backbone produces detailed, low‑flicker motion that holds up across 5–20 second shots, unlike many lighter I2V models.
Synchronized audio in one pass: It generates video and sound together, so ambience and simple SFX match the visual motion without a separate audio model.
Open and extensible: Weights, trainer, and LoRA hooks are available, letting you fine‑tune styles or camera behaviors and integrate directly into node‑based tools like ComfyUI.
Typical image‑to‑video usage
Provide a clean input image (JPG/PNG/WebP) that already matches your desired framing; the model is designed to “preserve input composition” while adding motion.
Add a concise motion prompt, for example describing camera behavior (“slow dolly‑in”, “handheld pan left”), subject motion (“hair and coat moving in wind”), and mood.
Choose duration (5–20 s) and resolution (480p/720p/1080p); on a mid‑range GPU users report 720p ~8 s clips in a few minutes with FP8/distilled configs.
Use cases
Turning strong stills (AI art, product renders, portraits, environments) into cinematic motion shots with camera moves and subtle environmental animation.
Building open‑source, fully local pipelines where both privacy and control over LoRAs/camera control adapters are important.
Generating B‑roll and hero shots for edits: the model’s composition‑preserving behavior makes it well‑suited for animating designed keyframes rather than re‑framing them.
Read more
Nodes & Models
PrimitiveInt
PrimitiveFloat
LoadImage
MarkdownNote
PrimitiveStringMultiline
EmptyImage
KSamplerSelect
RandomNoise
ManualSigmas
WorkflowGraphics
LoraLoaderModelOnly
your_camera_lora.safetensors
ImageScaleBy
LTXVEmptyLatentAudio
CLIPTextEncode
GetImageSize
LTXVConditioning
EmptyLTXVLatentVideo
CFGGuider
LTXVPreprocess
LTXVImgToVideoInplace
LTXVConcatAVLatent
SamplerCustomAdvanced
LTXVSeparateAVLatent
LTXVLatentUpsampler
LTXVAudioVAEDecode
CreateVideo
SaveVideo
CM_FloatToInt
ImpactExecutionOrderController
LTX‑2 19B Image‑to‑Video is an open‑source DiT‑based model that takes a single reference image plus a short prompt and turns it into a 5–20 second video clip with synchronized audio. It preserves the composition, lighting, and subject of the input image while adding natural camera and scene motion at 480p, 720p, or 1080p.
Overview
LTX‑2 is a 19‑billion‑parameter audio‑video foundation model that runs fully locally or via open APIs, covering text‑to‑video, image‑to‑video, and video‑to‑video in one architecture. The 19B image‑to‑video pipeline treats your still as a keyframe, then predicts frames and matching sound so the camera can dolly, pan, or orbit while elements like hair, clothing, and background move coherently over time.
Why use the 19B open‑source model
High fidelity & temporal stability: The large DiT backbone produces detailed, low‑flicker motion that holds up across 5–20 second shots, unlike many lighter I2V models.
Synchronized audio in one pass: It generates video and sound together, so ambience and simple SFX match the visual motion without a separate audio model.
Open and extensible: Weights, trainer, and LoRA hooks are available, letting you fine‑tune styles or camera behaviors and integrate directly into node‑based tools like ComfyUI.
Typical image‑to‑video usage
Provide a clean input image (JPG/PNG/WebP) that already matches your desired framing; the model is designed to “preserve input composition” while adding motion.
Add a concise motion prompt, for example describing camera behavior (“slow dolly‑in”, “handheld pan left”), subject motion (“hair and coat moving in wind”), and mood.
Choose duration (5–20 s) and resolution (480p/720p/1080p); on a mid‑range GPU users report 720p ~8 s clips in a few minutes with FP8/distilled configs.
Use cases
Turning strong stills (AI art, product renders, portraits, environments) into cinematic motion shots with camera moves and subtle environmental animation.
Building open‑source, fully local pipelines where both privacy and control over LoRAs/camera control adapters are important.
Generating B‑roll and hero shots for edits: the model’s composition‑preserving behavior makes it well‑suited for animating designed keyframes rather than re‑framing them.
Read more




