Workflows

Pricing

LTX 2 19B Pro for Text to Video

An open source LTX 2 Pro for Text to Video

Flimography

LTX 2 Pro

Open Source

Text2Video

Videography

265

Generates in about 9 mins 18 secs

floyoofficial

Nodes & Models

ComfyUI Official

PrimitiveFloat

PrimitiveInt

EmptyImage

MarkdownNote

RandomNoise

ManualSigmas

KSamplerSelect

WorkflowGraphics

LatentUpscaleModelLoader

ltx-2-spatial-upscaler-x2-1.0.safetensors

LTXVAudioVAELoader

ltx-2-19b-dev.safetensors

CheckpointLoaderSimple

ltx-2-19b-dev.safetensors

PrimitiveStringMultiline

ImageScaleBy

CLIPTextEncode

LTXVEmptyLatentAudio

GetImageSize

EmptyLTXVLatentVideo

LTXVConditioning

LTXVConcatAVLatent

CFGGuider

LTXVScheduler

SamplerCustomAdvanced

LTXVSeparateAVLatent

LTXVLatentUpsampler

LTXVAudioVAEDecode

CreateVideo

SaveVideo

ComfyUI-LTXVideo

LTXVGemmaCLIPModelLoader

gemma-3-12b-it-qat-q4_0-unquantized/model-00001-of-00005.safetensors

ltx-2-19b-dev.safetensors

LTXVGemmaEnhancePrompt

ComfyMath

CM_FloatToInt

LTX‑2 Pro text‑to‑video with the open‑source model is about using the full‑quality 19B checkpoint in “Pro” mode to turn prompts into longer, cinematic clips with synchronized audio and higher detail than the Fast/Distilled flows.

Overview

LTX‑2 is a DiT‑based audio‑video foundation model that generates video and audio together from text, images, or video inputs; the Pro flow is the high‑fidelity configuration focused on visual quality and temporal stability. In text‑to‑video, you provide a structured prompt and LTX‑2 Pro produces up to roughly 10–20 seconds of HD or 4K video with matching sound at up to around 50 fps, designed for production use rather than just quick previews.

Why use the Pro flow (open source)

Maximum quality: Pro flow uses the full 19B architecture (often in FP8 “Standard”) to keep fine detail, face quality, and motion consistency, where Fast trades some detail for speed.
Native 4K and high fps: Pro supports true 4K and high frame rates (up to about 50 fps) for cinematic or commercial delivery, exceeding many earlier open‑source models that cap at lower resolutions.
Open and self‑hostable: Weights and inference code are available, with GGUF and FP8 variants to run locally on consumer GPUs (for example, 12–24 GB VRAM) or via open providers.

Typical text‑to‑video usage

You write a prompt describing subject, environment, motion, camera behavior, and mood; LTX‑2 Pro parses this into a coherent 16:9 or custom‑ratio sequence with synchronized dialogue/ambience/music where appropriate.
You choose duration (commonly 6–20 seconds) and resolution (480p, 720p, 1080p, or 4K), and select the Pro/Standard configuration when quality matters more than speed.
In tools like ComfyUI, you pick the Pro checkpoint (for example FP8 Standard) and higher step counts, often combining it with LoRAs for style, camera control, or detail upscaling in a multiscale pipeline.

Use cases

Cinematic hero shots & ads: High‑impact scenes where lighting, texture, and motion must hold up in 1080p–4K delivery.
Storyboards to near‑final: Turning script‑like prompts into sequences that already look close to final edits, including audio, before human polish.
Open, controllable pipelines: Studio or toolmakers who need a production‑grade but open model they can fine‑tune, LoRA‑train, and integrate deeply into ComfyUI or custom backends.