LTX 2 19B Fast for Image to Video
A workflow for ltx 2 image to video using distilled model
Animation
Filmography
Image2Video
LTX 2
Open Source
3
592
Nodes & Models
PrimitiveInt
PrimitiveFloat
LoadImage
PrimitiveStringMultiline
EmptyImage
KSamplerSelect
RandomNoise
ManualSigmas
LoraLoaderModelOnly
your_camera_lora.safetensors
ImageScaleBy
LTXVEmptyLatentAudio
CLIPTextEncode
GetImageSize
LTXVConditioning
EmptyLTXVLatentVideo
CFGGuider
LTXVPreprocess
LTXVImgToVideoInplace
LTXVConcatAVLatent
SamplerCustomAdvanced
LTXVSeparateAVLatent
LTXVLatentUpsampler
LTXVAudioVAEDecode
CreateVideo
SaveVideo
CM_FloatToInt
ImpactExecutionOrderController
Turn a still image into a video clip with motion, camera movement, and synchronized audio using the LTX 2 19B distilled model.
Upload your image, write a prompt describing the motion you want, and LTX 2 generates a video that preserves your image's composition, lighting, and subject while adding natural camera and scene motion. Audio generates alongside the video in one pass, so ambient sound and effects match what's on screen. The distilled (Fast) model runs in fewer steps for quicker generation.
Upload your image, write a motion prompt, hit run.
How do you animate an image with LTX 2 19B Fast?
Upload your image and write a short prompt describing the motion and camera behavior. Don't describe what's already in the image. Describe what moves, how the camera behaves, and what sounds should play. The workflow includes a Gemma-based prompt enhancer that expands your short prompt into a detailed scene description, so you can keep your input concise. The model generates at 960x544 and upscales to 1080p using a built-in spatial upscaler.
Your image Upload a JPG, PNG, or WebP. The image sets the first frame. LTX 2 reads the composition, lighting, subject, and style from it and preserves them throughout the clip. The workflow resizes your image to fit the model's native resolution automatically.
Works with AI-generated art, product renders, portraits, environments, and photographs. Clean, well-composed images with clear subjects produce the best results. The model preserves your framing, so set up your shot before uploading.
Prompt Describe the motion, not the scene. If your image shows a lighthouse by the sea, don't write "a lighthouse by the sea." Write "ravaging sea waves near the lighthouse" or "slow dolly-in, waves crashing against rocks, spray catching the wind, ocean roar."
Use camera language: "slow dolly-in," "handheld pan left," "orbiting shot," "gentle push-in." LTX 2 responds to professional shot descriptions.
Include audio cues: "ocean roar," "wind howling," "soft ambient hum." Audio generates with the video, so describing the sound you want makes a difference.
Prompt Enhancer The workflow includes a Gemma 3 12B prompt enhancer that takes your short prompt and expands it into a detailed, action-focused scene description. It analyzes your input image, identifies the subject, setting, and mood, then writes a fuller prompt that guides the video generation. You write the short version. The enhancer handles the rest.
Duration Controlled by the length parameter. Default is 121 frames at 24 FPS, which gives you about 5 seconds. For longer clips (up to 20 seconds), increase the frame count. Longer clips take more time to generate and have higher risk of temporal drift.
Resolution The model generates at 960x544 internally. A built-in LTX spatial upscaler (2x) brings the output to 1080p. The final video renders at the frame rate you set (default 24 FPS).
Image influence strength The LTXVImgToVideoInplace node has a strength setting (default: 0.6 for the first pass, 1.0 for the upscale pass). At 0.6, LTX 2 follows your image's composition closely while having room to add motion and scene changes. Higher values keep the output closer to the original image. Lower values give the model more freedom to reinterpret.
Sampler Euler sampler with manual sigma schedules. The distilled model uses a compressed step count (4 steps for the base pass, 9 steps for the upscale pass) for fast inference. CFG guidance is set to 1. These are pre-tuned for the distilled checkpoint. No need to change them.
LoRA support (bypassed by default) Two LoRA loader nodes are included but bypassed. Enable them (Ctrl+B) to load camera LoRAs from Lightricks' collection for specific camera behaviors. When using camera LoRAs, set strength to 1. A second LoRA slot is available for style or detail LoRAs.
What is LTX 2 19B Fast image-to-video good for?
LTX 2 19B Fast is built for turning strong stills into cinematic motion shots. It preserves your input composition while adding camera moves and environmental animation. The synchronized audio means your clips ship with sound. The distilled model prioritizes speed, making it good for iteration and concept testing.
Animating AI-generated art. Generated an image you like with Z-Image, Flux, or Qwen? Upload it here and add motion. The model preserves the style and composition of your original image while adding camera movement and scene animation. Your image becomes the first frame of a cinematic clip.
Product and hero shots. Animate a product render or photograph into a reveal sequence. "Slow orbit around the bottle, studio lighting, soft reflection on the surface, ambient hum." The composition stays locked while the camera moves.
B-roll and edit footage. Generate atmospheric or environmental shots from designed keyframes. The composition-preserving behavior makes this model well-suited for animating storyboard frames rather than re-framing them.
Concept and pre-visualization. Test how a still image looks in motion before committing to production. The Fast model generates quick enough to try multiple motion prompts on the same image and compare results.
Honest limitations. Character faces can drift on clips longer than 8-10 seconds. Complex multi-person motion produces artifacts. The distilled model trades some detail for speed. If you need maximum quality and don't mind waiting, the Pro (non-distilled) model produces sharper results. Hand and finger detail is better than earlier models but still imperfect on close-ups.
FAQ
What is the difference between LTX 2 Fast and LTX 2 Pro for image-to-video?
Fast is the distilled model. Fewer inference steps, quicker generation. Pro is the full model. More steps, sharper detail, better temporal stability on longer clips. Use Fast for iteration and concept testing. Use Pro when you need the final-quality render. Both accept the same inputs and prompts.
How long can the output video be?
Up to about 20 seconds. The default is 121 frames at 24 FPS (about 5 seconds). Increase the frame count for longer clips. Shorter clips (5-8 seconds) have the best temporal stability. Longer clips are more likely to show drift in faces and fine details.
Does LTX 2 generate audio with the video?
Yes. Audio and video generate in a single pass. The model produces ambient sound, effects, and environmental audio that match on-screen events. Include audio descriptions in your prompt for better results. For dedicated dialogue or music, pair the output with a separate audio workflow.
What resolution does LTX 2 19B Fast output?
The model generates at 960x544 internally. A built-in 2x spatial upscaler brings the final output to 1080p. Frame rate is 24 FPS by default.
How should I prompt LTX 2 for image-to-video?
Don't describe what's already in your image. Describe what changes: camera motion, subject movement, environmental animation, and audio. Use professional shot language ("slow dolly-in," "tracking left," "shallow depth of field shift"). Keep your prompt concise. The built-in Gemma enhancer expands it into a full scene description automatically.
How do I run LTX 2 19B Fast image-to-video online?
You can run this workflow online through Floyo. No installation, no setup. Open the workflow in your browser, upload your image, and hit run. Free to try.
Read more



