Create with Alibaba Happy Horse model now! Try here 👉

Pricing

Create with Alibaba Happy Horse model now! Try here 👉

COMMUNITY PAGE

LTX 2.3 on Floyo

Home / Model / LTX 2.3 on Floyo

AI VIDEO GENERATION

Run LTX 2.3 on Floyo

Open-source video generation with native audio, LoRA fine-tuning, and up to 4K output.

Run Lightricks' LTX 2.3 through ComfyUI in your browser. No API key, no installs, no local GPU.

Try LTX 2.3 Free →

Free to try · No installation · Runs in browser · Updated March 2026

What You Get

LTX 2.3 is Lightricks' open-source video generation model. It produces up to 20-second clips at 1080p (upscalable to 4K) with synchronized audio in a single pass. It supports text-to-video, image-to-video, audio-to-video, video extension, and LoRA fine-tuning with up to 3 adapters. The #1 open-source video model on Artificial Analysis benchmarks. Available as ComfyUI nodes on Floyo.

LTX 2.3 WORKFLOWS ON FLOYO

LTX 2.3 Pro Text to Video

LTX 2.3 Pro Image to Video

LTX 2.3 Text to Video and Image to Video

LTX 2.3 Image to Video with Two Pass

LTX 2.3 Audio to Video

LTX 2.3 Extend Video

What is LTX 2.3?

LTX 2.3 is an open-source audio-video foundation model from Lightricks, released on March 5, 2026. It is a 22-billion parameter Diffusion Transformer (DiT) that generates synchronized video and audio in a single pass. It supports text-to-video, image-to-video, audio-to-video, video extension, and LoRA fine-tuning. The full weights are available under the Apache 2.0 license.

The model uses a dual-stream architecture. A 14-billion parameter stream handles video while a 5-billion parameter stream handles audio. Both are connected through bidirectional cross-attention, which means the audio and video stay synchronized at the architecture level. No separate audio model needed.

LTX 2.3 builds on LTX 2 with three rebuilt core components: a new VAE for sharper textures and facial detail, a 4x larger text connector for better prompt following, and an improved HiFi-GAN vocoder for cleaner stereo audio at 24 kHz. It also adds native portrait mode (9:16 at 1080x1920) and last-frame interpolation.

On Floyo, you access LTX 2.3 through the built-in LTXVideo ComfyUI nodes. Search "LTX 2.3" in the template library for ready-made workflows covering text-to-video, image-to-video, audio-to-video, and video extension. Floyo runs the model on H100 NVL GPUs, so you get full-speed generation without needing your own hardware.

What are LTX 2.3's technical specifications?

LTX 2.3 is a 22-billion parameter DiT model that generates video at up to 1080p natively (upscalable to 4K) with synchronized stereo audio. It supports 24/25/48/50 FPS, clips up to 20 seconds, native 9:16 portrait output, and LoRA fine-tuning with up to 3 simultaneous adapters. Two generation flows are available: Fast (speed-optimized) and Pro (quality-optimized).

Spec	Details
Developer	Lightricks
Parameters	22 billion (14B video + 5B audio, dual-stream DiT)
Resolution	480p to 1080p native, up to 4K with upscaler models
Duration	5 to 20 seconds per generation
FPS	24 / 25 / 48 / 50 (temporal upscaler doubles frame rate)
Audio	Native synchronized stereo at 24 kHz (HiFi-GAN vocoder)
Aspect Ratios	16:9 (landscape), 9:16 (portrait at 1080x1920), and more
LoRA Support	Up to 3 simultaneous adapters (style, character, motion)
Model Variants	Dev (full, trainable), Distilled (8-step fast inference)
Generation Flows	Fast (speed-optimized) and Pro (quality-optimized)
License	Apache 2.0 (free for companies under $10M revenue)
ComfyUI Access	Built-in LTXVideo nodes via ComfyUI Manager
Open Source	Full weights, code, and training scripts on HuggingFace and GitHub

What can you create with LTX 2.3?

LTX 2.3 supports seven generation modes: text-to-video, image-to-video, audio-to-video, video extension, retake, video-to-video, plus fast variants for text-to-video and image-to-video. All video modes produce synchronized audio. LoRA adapters can be applied across all modes for custom styles, characters, and motion patterns.

Capability	What It Does	Use Case
Text-to-Video	Generates 5-20 second clips with audio from text prompts. Fast and Pro flows available.	Ad creatives, social content, storyboard previews
Image-to-Video	Animates still images with improved motion (less Ken Burns, less freezing). Fast and Pro flows.	Product demos, character animation, portfolio pieces
Audio-to-Video	Takes an audio clip and generates video that matches its structure, pacing, and tone	Podcast visuals, music videos, voice-driven content
Video Extension	Continues an existing clip from where it left off, preserving style and motion	Building longer sequences, adding endings to clips
Retake	Regenerates a section of a video without discarding the whole generation	Fixing problem segments, iterating on specific scenes
LoRA Fine-Tuning	Train custom adapters for style, character, or motion. Up to 3 adapters at once. Training takes under an hour.	Brand consistency, recurring characters, signature aesthetics
Native Portrait	Generates 9:16 vertical video at up to 1080x1920, trained on portrait data (not cropped from landscape)	TikTok, Instagram Reels, YouTube Shorts

How does LTX 2.3 compare to other video generation models?

LTX 2.3 is the #1 open-source video model on the Artificial Analysis leaderboard. Closed models like Kling 3.0 and Runway Gen-4.5 score higher on perceptual quality. Where LTX 2.3 wins is resolution (4K with upscalers), duration (20 seconds), cost (open-source), LoRA customization, and the option to self-host.

Model	Max Duration	Max Resolution	Native Audio	LoRA Support	Open Source
LTX 2.3	20 seconds	4K (upscaled)	Yes	Yes (up to 3)	Yes
Veo 3.1	8 seconds	4K	Yes	No	No
Wan 2.5	5 seconds	720p	No	Yes	Yes
Kling 3.0	10 seconds	1080p	Yes	No	No

Source: Lightricks official docs, Artificial Analysis leaderboard, and model documentation as of March 2026. Specs may vary by version.

How does LTX 2.3 work?

LTX 2.3 uses an Asymmetric Dual-Stream Diffusion Transformer. The video stream (14B parameters) uses 3D Rotary Positional Embeddings for spatial and temporal dynamics. The audio stream (5B parameters) uses 1D temporal RoPE. Both streams are linked by bidirectional cross-attention, so audio and video are generated together, not stitched after the fact.

The model ships in two variants. The Dev model is the full checkpoint in bf16 precision, designed for fine-tuning and LoRA training. The Distilled model uses 8 steps for faster inference with lower memory overhead. A distilled LoRA adapter bridges the two: it applies distillation behavior to the Dev model so you get the Dev model's quality ceiling with faster sampling.

Lightricks also released spatial and temporal upscaler models alongside LTX 2.3. The spatial upscalers let you generate at a manageable resolution and scale up to 4K afterward. The temporal upscaler doubles the frame rate of existing clips. This makes high-resolution, high-frame-rate output practical on consumer hardware.

On Floyo, LTX 2.3 runs through the built-in LTXVideo ComfyUI nodes. You can chain it with other models in the same workflow. Generate a base image with Flux, animate it with LTX 2.3, upscale the result, and apply a LoRA for your brand's visual style. All inside one pipeline on Floyo's H100 NVL GPUs.

Note: LoRAs trained on earlier LTX-2 versions need to be retrained for LTX 2.3 because the latent space changed with the new VAE. Lightricks provides training scripts that complete in under an hour for most configurations. The model also has some known limitations: it may not match prompts perfectly in every case, and audio quality can be lower when generating non-speech sounds.

Where does LTX 2.3 fit in a production pipeline?

LTX 2.3 covers concept, style lock, character, scene composition, motion, and final stages of a production pipeline. Its LoRA support makes it the only video model with full style-locking capability. Its audio-to-video mode handles audio-driven workflows. Video extension and retake modes let you build and iterate on longer sequences without starting over.

Pipeline Stage	How LTX 2.3 Fits
1. Concept	Fast flow text-to-video for rapid concept exploration. Generate multiple directions quickly to test ideas before committing.
2. Style Lock	Train a LoRA on your visual style, then apply it to every generation. LTX 2.3 is the only video model with full LoRA support for style locking.
3. Character	Character LoRAs maintain consistent appearance across scenes. Stack up to 3 adapters (character + style + motion) simultaneously.
4. Scene Comp	Image-to-video takes composed stills into animated scenes. 4x larger text connector follows complex spatial and camera instructions.
5. Motion	Pro flow for high-fidelity motion. Audio-to-video syncs pacing to voiceover or music. Video extension builds longer sequences clip by clip.
6. Final	Spatial upscaler pushes output to 4K. Temporal upscaler doubles frame rate. Retake mode lets you fix individual segments without regenerating everything.

Frequently Asked Questions

Common questions about running LTX 2.3 on Floyo.

Is LTX 2.3 free to use on Floyo?

LTX 2.3 is open-source, so there is no additional API cost beyond Floyo's GPU time. You can try it with Floyo's free tier, which gives you 20 minutes of GPU time per day. That is enough for multiple video generations depending on duration and resolution.

How do I run LTX 2.3 without installing anything?

Open Floyo in your browser, find an LTX 2.3 workflow (search "LTX 2.3" in the template library), and click Run. Floyo handles the GPU, the ComfyUI environment, and the model weights. No local install, no Python setup, no API key required.

Who made LTX 2.3?

Lightricks, a creative technology company known for Facetune and other mobile editing tools. LTX 2.3 was released on March 5, 2026 under the Apache 2.0 license. Full weights, code, and training scripts are available on HuggingFace and GitHub.

How does LTX 2.3 compare to Wan 2.5?

Both are open-source video models. LTX 2.3 generates longer clips (20 seconds vs 5 seconds), supports higher resolution (up to 4K vs 720p), includes native audio generation, and offers LoRA fine-tuning. Wan 2.5 is smaller and runs faster on consumer GPUs. LTX 2.3 is the stronger choice for production workflows that need duration, audio, or customization.

Can I combine LTX 2.3 with other AI models in one workflow?

Yes. Floyo runs ComfyUI, which lets you chain multiple models in a single workflow. Generate a still with Flux, animate it with LTX 2.3, apply a LoRA for your brand style, then upscale to 4K. All in one pipeline, all in your browser.

How fast is LTX 2.3?

LTX 2.3 offers two flows. Fast flow is optimized for speed and tight feedback loops. Pro flow prioritizes visual quality and consistency. The distilled variant uses only 8 inference steps for near real-time generation. Exact times depend on resolution, duration, and hardware.

Can I use LTX 2.3 output commercially?

Yes. LTX 2.3 is released under the Apache 2.0 license, which allows commercial use without restriction for companies with under $10 million in annual revenue. Larger companies embedding the model into commercial products need a license from Lightricks.

Can I train custom LoRAs for LTX 2.3?

Yes. Lightricks provides the LTX-2 Trainer with reproducible LoRA and IC-LoRA training scripts. Training for motion, style, or likeness (sound and appearance) can complete in under an hour in many configurations. You can apply up to 3 LoRA adapters at the same time during generation.

Try LTX 2.3 on Floyo

Open-source video with native audio, LoRA fine-tuning, and up to 4K output. Run it in your browser.

Try LTX 2.3 Free → View Pricing