LTX 2 19B Fast for Image to Video
A workflow for ltx 2 image to video using distilled model
Animation
Filmography
Image2Video
LTX 2
Open Source
2
409
Nodes & Models
PrimitiveInt
PrimitiveFloat
LoadImage
MarkdownNote
PrimitiveStringMultiline
EmptyImage
KSamplerSelect
RandomNoise
ManualSigmas
WorkflowGraphics
LoraLoaderModelOnly
your_camera_lora.safetensors
ImageScaleBy
LTXVEmptyLatentAudio
CLIPTextEncode
GetImageSize
LTXVConditioning
EmptyLTXVLatentVideo
CFGGuider
LTXVPreprocess
LTXVImgToVideoInplace
LTXVConcatAVLatent
SamplerCustomAdvanced
LTXVSeparateAVLatent
LTXVLatentUpsampler
LTXVAudioVAEDecode
CreateVideo
SaveVideo
CM_FloatToInt
ImpactExecutionOrderController
LTX‑2 19B Image‑to‑Video is an open‑source DiT‑based model that takes a single reference image plus a short prompt and turns it into a 5–20 second video clip with synchronized audio. It preserves the composition, lighting, and subject of the input image while adding natural camera and scene motion at 480p, 720p, or 1080p.​
Overview
LTX‑2 is a 19‑billion‑parameter audio‑video foundation model that runs fully locally or via open APIs, covering text‑to‑video, image‑to‑video, and video‑to‑video in one architecture. The 19B image‑to‑video pipeline treats your still as a keyframe, then predicts frames and matching sound so the camera can dolly, pan, or orbit while elements like hair, clothing, and background move coherently over time.​​
Why use the 19B open‑source model
High fidelity & temporal stability: The large DiT backbone produces detailed, low‑flicker motion that holds up across 5–20 second shots, unlike many lighter I2V models.​
Synchronized audio in one pass: It generates video and sound together, so ambience and simple SFX match the visual motion without a separate audio model.​
Open and extensible: Weights, trainer, and LoRA hooks are available, letting you fine‑tune styles or camera behaviors and integrate directly into node‑based tools like ComfyUI.​
Typical image‑to‑video usage
Provide a clean input image (JPG/PNG/WebP) that already matches your desired framing; the model is designed to “preserve input composition” while adding motion.​
Add a concise motion prompt, for example describing camera behavior (“slow dolly‑in”, “handheld pan left”), subject motion (“hair and coat moving in wind”), and mood.​
Choose duration (5–20 s) and resolution (480p/720p/1080p); on a mid‑range GPU users report 720p ~8 s clips in a few minutes with FP8/distilled configs.​
Use cases
Turning strong stills (AI art, product renders, portraits, environments) into cinematic motion shots with camera moves and subtle environmental animation.​
Building open‑source, fully local pipelines where both privacy and control over LoRAs/camera control adapters are important.​
Generating B‑roll and hero shots for edits: the model’s composition‑preserving behavior makes it well‑suited for animating designed keyframes rather than re‑framing them.
Read more




