floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

LTX 2 19B Pro for Text to Video

An open source LTX 2 Pro for Text to Video

55

LTX‑2 Pro text‑to‑video with the open‑source model is about using the full‑quality 19B checkpoint in “Pro” mode to turn prompts into longer, cinematic clips with synchronized audio and higher detail than the Fast/Distilled flows.​

Overview

LTX‑2 is a DiT‑based audio‑video foundation model that generates video and audio together from text, images, or video inputs; the Pro flow is the high‑fidelity configuration focused on visual quality and temporal stability. In text‑to‑video, you provide a structured prompt and LTX‑2 Pro produces up to roughly 10–20 seconds of HD or 4K video with matching sound at up to around 50 fps, designed for production use rather than just quick previews.​​

Why use the Pro flow (open source)

  • Maximum quality: Pro flow uses the full 19B architecture (often in FP8 “Standard”) to keep fine detail, face quality, and motion consistency, where Fast trades some detail for speed.​

  • Native 4K and high fps: Pro supports true 4K and high frame rates (up to about 50 fps) for cinematic or commercial delivery, exceeding many earlier open‑source models that cap at lower resolutions.​

  • Open and self‑hostable: Weights and inference code are available, with GGUF and FP8 variants to run locally on consumer GPUs (for example, 12–24 GB VRAM) or via open providers.​

Typical text‑to‑video usage

  • You write a prompt describing subject, environment, motion, camera behavior, and mood; LTX‑2 Pro parses this into a coherent 16:9 or custom‑ratio sequence with synchronized dialogue/ambience/music where appropriate.​​

  • You choose duration (commonly 6–20 seconds) and resolution (480p, 720p, 1080p, or 4K), and select the Pro/Standard configuration when quality matters more than speed.​

  • In tools like ComfyUI, you pick the Pro checkpoint (for example FP8 Standard) and higher step counts, often combining it with LoRAs for style, camera control, or detail upscaling in a multiscale pipeline.​​

Use cases

  • Cinematic hero shots & ads: High‑impact scenes where lighting, texture, and motion must hold up in 1080p–4K delivery.​

  • Storyboards to near‑final: Turning script‑like prompts into sequences that already look close to final edits, including audio, before human polish.​​

  • Open, controllable pipelines: Studio or toolmakers who need a production‑grade but open model they can fine‑tune, LoRA‑train, and integrate deeply into ComfyUI or custom backends.

Read more

N
Generates in about 9 mins 17 secs

Nodes & Models

PrimitiveFloat
PrimitiveInt
EmptyImage
MarkdownNote
RandomNoise
ManualSigmas
KSamplerSelect
WorkflowGraphics
LatentUpscaleModelLoader
ltx-2-spatial-upscaler-x2-1.0.safetensors
LTXVAudioVAELoader
ltx-2-19b-dev.safetensors
CheckpointLoaderSimple
ltx-2-19b-dev.safetensors
PrimitiveStringMultiline
ImageScaleBy
CLIPTextEncode
LTXVEmptyLatentAudio
GetImageSize
EmptyLTXVLatentVideo
LTXVConditioning
LTXVConcatAVLatent
CFGGuider
LTXVScheduler
SamplerCustomAdvanced
LTXVSeparateAVLatent
LTXVLatentUpsampler
LTXVAudioVAEDecode
CreateVideo
SaveVideo
LTXVGemmaCLIPModelLoader
gemma-3-12b-it-qat-q4_0-unquantized/model-00001-of-00005.safetensors
ltx-2-19b-dev.safetensors
LTXVGemmaEnhancePrompt
CM_FloatToInt

LTX‑2 Pro text‑to‑video with the open‑source model is about using the full‑quality 19B checkpoint in “Pro” mode to turn prompts into longer, cinematic clips with synchronized audio and higher detail than the Fast/Distilled flows.​

Overview

LTX‑2 is a DiT‑based audio‑video foundation model that generates video and audio together from text, images, or video inputs; the Pro flow is the high‑fidelity configuration focused on visual quality and temporal stability. In text‑to‑video, you provide a structured prompt and LTX‑2 Pro produces up to roughly 10–20 seconds of HD or 4K video with matching sound at up to around 50 fps, designed for production use rather than just quick previews.​​

Why use the Pro flow (open source)

  • Maximum quality: Pro flow uses the full 19B architecture (often in FP8 “Standard”) to keep fine detail, face quality, and motion consistency, where Fast trades some detail for speed.​

  • Native 4K and high fps: Pro supports true 4K and high frame rates (up to about 50 fps) for cinematic or commercial delivery, exceeding many earlier open‑source models that cap at lower resolutions.​

  • Open and self‑hostable: Weights and inference code are available, with GGUF and FP8 variants to run locally on consumer GPUs (for example, 12–24 GB VRAM) or via open providers.​

Typical text‑to‑video usage

  • You write a prompt describing subject, environment, motion, camera behavior, and mood; LTX‑2 Pro parses this into a coherent 16:9 or custom‑ratio sequence with synchronized dialogue/ambience/music where appropriate.​​

  • You choose duration (commonly 6–20 seconds) and resolution (480p, 720p, 1080p, or 4K), and select the Pro/Standard configuration when quality matters more than speed.​

  • In tools like ComfyUI, you pick the Pro checkpoint (for example FP8 Standard) and higher step counts, often combining it with LoRAs for style, camera control, or detail upscaling in a multiscale pipeline.​​

Use cases

  • Cinematic hero shots & ads: High‑impact scenes where lighting, texture, and motion must hold up in 1080p–4K delivery.​

  • Storyboards to near‑final: Turning script‑like prompts into sequences that already look close to final edits, including audio, before human polish.​​

  • Open, controllable pipelines: Studio or toolmakers who need a production‑grade but open model they can fine‑tune, LoRA‑train, and integrate deeply into ComfyUI or custom backends.

Read more

N
FloYo: LTX 2 19B Pro for Text to Video