Home / Model / LTX 2.3 on Floyo
AI VIDEO GENERATION
Run LTX 2.3 on Floyo
Open-source video generation with native audio, LoRA fine-tuning, and up to 4K output.
Run Lightricks' LTX 2.3 through ComfyUI in your browser. No API key, no installs, no local GPU.
Free to try · No installation · Runs in browser · Updated March 2026


_1774933481245.webp?width=1400&height=620&quality=80&resize=cover)
_1774933481245.webp?width=1400&height=620&quality=80&resize=cover)

_1774933481245.webp?width=104&height=104&quality=80&resize=cover)
What You Get
LTX 2.3 is Lightricks' open-source video generation model. It produces up to 20-second clips at 1080p (upscalable to 4K) with synchronized audio in a single pass. It supports text-to-video, image-to-video, audio-to-video, video extension, and LoRA fine-tuning with up to 3 adapters. The #1 open-source video model on Artificial Analysis benchmarks. Available as ComfyUI nodes on Floyo.
LTX 2.3 WORKFLOWS ON FLOYO
LTX 2.3 Text to Video and Image to Video
LTX 2.3 Image to Video with Two Pass
What is LTX 2.3?
LTX 2.3 is an open-source audio-video foundation model from Lightricks, released on March 5, 2026. It is a 22-billion parameter Diffusion Transformer (DiT) that generates synchronized video and audio in a single pass. It supports text-to-video, image-to-video, audio-to-video, video extension, and LoRA fine-tuning. The full weights are available under the Apache 2.0 license.
The model uses a dual-stream architecture. A 14-billion parameter stream handles video while a 5-billion parameter stream handles audio. Both are connected through bidirectional cross-attention, which means the audio and video stay synchronized at the architecture level. No separate audio model needed.
LTX 2.3 builds on LTX 2 with three rebuilt core components: a new VAE for sharper textures and facial detail, a 4x larger text connector for better prompt following, and an improved HiFi-GAN vocoder for cleaner stereo audio at 24 kHz. It also adds native portrait mode (9:16 at 1080x1920) and last-frame interpolation.
On Floyo, you access LTX 2.3 through the built-in LTXVideo ComfyUI nodes. Search "LTX 2.3" in the template library for ready-made workflows covering text-to-video, image-to-video, audio-to-video, and video extension. Floyo runs the model on H100 NVL GPUs, so you get full-speed generation without needing your own hardware.
What are LTX 2.3's technical specifications?
LTX 2.3 is a 22-billion parameter DiT model that generates video at up to 1080p natively (upscalable to 4K) with synchronized stereo audio. It supports 24/25/48/50 FPS, clips up to 20 seconds, native 9:16 portrait output, and LoRA fine-tuning with up to 3 simultaneous adapters. Two generation flows are available: Fast (speed-optimized) and Pro (quality-optimized).
| Spec | Details |
|---|---|
| Developer | Lightricks |
| Parameters | 22 billion (14B video + 5B audio, dual-stream DiT) |
| Resolution | 480p to 1080p native, up to 4K with upscaler models |
| Duration | 5 to 20 seconds per generation |
| FPS | 24 / 25 / 48 / 50 (temporal upscaler doubles frame rate) |
| Audio | Native synchronized stereo at 24 kHz (HiFi-GAN vocoder) |
| Aspect Ratios | 16:9 (landscape), 9:16 (portrait at 1080x1920), and more |
| LoRA Support | Up to 3 simultaneous adapters (style, character, motion) |
| Model Variants | Dev (full, trainable), Distilled (8-step fast inference) |
| Generation Flows | Fast (speed-optimized) and Pro (quality-optimized) |
| License | Apache 2.0 (free for companies under $10M revenue) |
| ComfyUI Access | Built-in LTXVideo nodes via ComfyUI Manager |
| Open Source | Full weights, code, and training scripts on HuggingFace and GitHub |
What can you create with LTX 2.3?
LTX 2.3 supports seven generation modes: text-to-video, image-to-video, audio-to-video, video extension, retake, video-to-video, plus fast variants for text-to-video and image-to-video. All video modes produce synchronized audio. LoRA adapters can be applied across all modes for custom styles, characters, and motion patterns.
| Capability | What It Does | Use Case |
|---|---|---|
| Text-to-Video | Generates 5-20 second clips with audio from text prompts. Fast and Pro flows available. | Ad creatives, social content, storyboard previews |
| Image-to-Video | Animates still images with improved motion (less Ken Burns, less freezing). Fast and Pro flows. | Product demos, character animation, portfolio pieces |
| Audio-to-Video | Takes an audio clip and generates video that matches its structure, pacing, and tone | Podcast visuals, music videos, voice-driven content |
| Video Extension | Continues an existing clip from where it left off, preserving style and motion | Building longer sequences, adding endings to clips |
| Retake | Regenerates a section of a video without discarding the whole generation | Fixing problem segments, iterating on specific scenes |
| LoRA Fine-Tuning | Train custom adapters for style, character, or motion. Up to 3 adapters at once. Training takes under an hour. | Brand consistency, recurring characters, signature aesthetics |
| Native Portrait | Generates 9:16 vertical video at up to 1080x1920, trained on portrait data (not cropped from landscape) | TikTok, Instagram Reels, YouTube Shorts |
How does LTX 2.3 compare to other video generation models?
LTX 2.3 is the #1 open-source video model on the Artificial Analysis leaderboard. Closed models like Kling 3.0 and Runway Gen-4.5 score higher on perceptual quality. Where LTX 2.3 wins is resolution (4K with upscalers), duration (20 seconds), cost (open-source), LoRA customization, and the option to self-host.
| Model | Max Duration | Max Resolution | Native Audio | LoRA Support | Open Source |
|---|---|---|---|---|---|
| LTX 2.3 | 20 seconds | 4K (upscaled) | Yes | Yes (up to 3) | Yes |
| Veo 3.1 | 8 seconds | 4K | Yes | No | No |
| Wan 2.5 | 5 seconds | 720p | No | Yes | Yes |
| Kling 3.0 | 10 seconds | 1080p | Yes | No | No |
Source: Lightricks official docs, Artificial Analysis leaderboard, and model documentation as of March 2026. Specs may vary by version.
How does LTX 2.3 work?
LTX 2.3 uses an Asymmetric Dual-Stream Diffusion Transformer. The video stream (14B parameters) uses 3D Rotary Positional Embeddings for spatial and temporal dynamics. The audio stream (5B parameters) uses 1D temporal RoPE. Both streams are linked by bidirectional cross-attention, so audio and video are generated together, not stitched after the fact.
The model ships in two variants. The Dev model is the full checkpoint in bf16 precision, designed for fine-tuning and LoRA training. The Distilled model uses 8 steps for faster inference with lower memory overhead. A distilled LoRA adapter bridges the two: it applies distillation behavior to the Dev model so you get the Dev model's quality ceiling with faster sampling.
Lightricks also released spatial and temporal upscaler models alongside LTX 2.3. The spatial upscalers let you generate at a manageable resolution and scale up to 4K afterward. The temporal upscaler doubles the frame rate of existing clips. This makes high-resolution, high-frame-rate output practical on consumer hardware.
On Floyo, LTX 2.3 runs through the built-in LTXVideo ComfyUI nodes. You can chain it with other models in the same workflow. Generate a base image with Flux, animate it with LTX 2.3, upscale the result, and apply a LoRA for your brand's visual style. All inside one pipeline on Floyo's H100 NVL GPUs.
Note: LoRAs trained on earlier LTX-2 versions need to be retrained for LTX 2.3 because the latent space changed with the new VAE. Lightricks provides training scripts that complete in under an hour for most configurations. The model also has some known limitations: it may not match prompts perfectly in every case, and audio quality can be lower when generating non-speech sounds.
Where does LTX 2.3 fit in a production pipeline?
LTX 2.3 covers concept, style lock, character, scene composition, motion, and final stages of a production pipeline. Its LoRA support makes it the only video model with full style-locking capability. Its audio-to-video mode handles audio-driven workflows. Video extension and retake modes let you build and iterate on longer sequences without starting over.
| Pipeline Stage | How LTX 2.3 Fits |
|---|---|
| 1. Concept | Fast flow text-to-video for rapid concept exploration. Generate multiple directions quickly to test ideas before committing. |
| 2. Style Lock | Train a LoRA on your visual style, then apply it to every generation. LTX 2.3 is the only video model with full LoRA support for style locking. |
| 3. Character | Character LoRAs maintain consistent appearance across scenes. Stack up to 3 adapters (character + style + motion) simultaneously. |
| 4. Scene Comp | Image-to-video takes composed stills into animated scenes. 4x larger text connector follows complex spatial and camera instructions. |
| 5. Motion | Pro flow for high-fidelity motion. Audio-to-video syncs pacing to voiceover or music. Video extension builds longer sequences clip by clip. |
| 6. Final | Spatial upscaler pushes output to 4K. Temporal upscaler doubles frame rate. Retake mode lets you fix individual segments without regenerating everything. |
Frequently Asked Questions
Common questions about running LTX 2.3 on Floyo.
LTX 2.3 is open-source, so there is no additional API cost beyond Floyo's GPU time. You can try it with Floyo's free tier, which gives you 20 minutes of GPU time per day. That is enough for multiple video generations depending on duration and resolution.
Open Floyo in your browser, find an LTX 2.3 workflow (search "LTX 2.3" in the template library), and click Run. Floyo handles the GPU, the ComfyUI environment, and the model weights. No local install, no Python setup, no API key required.
Lightricks, a creative technology company known for Facetune and other mobile editing tools. LTX 2.3 was released on March 5, 2026 under the Apache 2.0 license. Full weights, code, and training scripts are available on HuggingFace and GitHub.
Both are open-source video models. LTX 2.3 generates longer clips (20 seconds vs 5 seconds), supports higher resolution (up to 4K vs 720p), includes native audio generation, and offers LoRA fine-tuning. Wan 2.5 is smaller and runs faster on consumer GPUs. LTX 2.3 is the stronger choice for production workflows that need duration, audio, or customization.
Yes. Floyo runs ComfyUI, which lets you chain multiple models in a single workflow. Generate a still with Flux, animate it with LTX 2.3, apply a LoRA for your brand style, then upscale to 4K. All in one pipeline, all in your browser.
LTX 2.3 offers two flows. Fast flow is optimized for speed and tight feedback loops. Pro flow prioritizes visual quality and consistency. The distilled variant uses only 8 inference steps for near real-time generation. Exact times depend on resolution, duration, and hardware.
Yes. LTX 2.3 is released under the Apache 2.0 license, which allows commercial use without restriction for companies with under $10 million in annual revenue. Larger companies embedding the model into commercial products need a license from Lightricks.
Yes. Lightricks provides the LTX-2 Trainer with reproducible LoRA and IC-LoRA training scripts. Training for motion, style, or likeness (sound and appearance) can complete in under an hour in many configurations. You can apply up to 3 LoRA adapters at the same time during generation.
Try LTX 2.3 on Floyo
Open-source video with native audio, LoRA fine-tuning, and up to 4K output. Run it in your browser.
Related Reading
Setting Up an AI Production Pipeline for Your Studio
Film and Animation Workflows on Floyo
Last updated: March 2026. Specs and benchmarks from Lightricks official docs, HuggingFace model card, Artificial Analysis leaderboard, and fal.ai documentation.