Home / Model / Lumina Image 2.0 on Floyo
AI IMAGE GENERATION
Run Lumina Image 2.0 on Floyo
A 2 billion parameter flow-based diffusion transformer with strong typography, complex prompt understanding, and multilingual generation. Open source under Apache 2.0.
Run Alpha-VLLM's Lumina Image 2.0 through ComfyUI in your browser. No API key, no installs, no local GPU.
Parameters
2B
Resolution
Up to 1024x1024
License
Apache 2.0
Architecture
Flow-based DiT
Run now on Floyo Browse All Models
No installation. Runs in browser. Updated April 2026.
What is Lumina Image 2.0?
Lumina Image 2.0 is a text-to-image model from Alpha-VLLM, initially released January 25, 2025, with the full technical report published March 28, 2025. It uses a 2 billion parameter flow-based diffusion transformer (Unified Next-DiT) that treats text and image tokens as a joint sequence. On the DPG benchmark, it scores 87.20. On GenEval, it scores 0.73. Both results are competitive with models that have significantly more parameters.
What are Lumina Image 2.0's technical specifications?
Lumina Image 2.0 uses a Unified Next-DiT architecture with 2 billion parameters. It pairs a Gemma-2-2B text encoder with a FLUX-VAE-16CH autoencoder. The model supports multiple inference solvers (Midpoint, Euler, DPM), CFG-Renormalization and CFG-Truncation for faster inference, and flash attention for efficient computation. Training used 200 million image-text pairs across three progressive stages.
| Spec | Details |
|---|---|
| Developer | Alpha-VLLM (Shanghai AI Laboratory) |
| Architecture | Unified Next-DiT (flow-based diffusion transformer) |
| Parameters | 2 billion |
| Text Encoder | Gemma-2-2B |
| VAE | FLUX-VAE-16CH |
| Max Resolution | 1024x1024 |
| Typography | Enhanced text rendering in generated images |
| Multilingual | Yes (no language-specific preprocessing required) |
| Inference Solvers | Midpoint, Euler, DPM Solver |
| Training Data | 200M image-text pairs (3-stage progressive training) |
| DPG Score | 87.20 (Overall) |
| GenEval Score | 0.73 |
| License | Apache 2.0 (full commercial rights) |
| Fine-Tuning | LoRA and full fine-tuning supported |
| ComfyUI Access | Native ComfyUI support + Diffusers library |
| Release Date | January 25, 2025 (tech report March 28, 2025) |
What are Lumina Image 2.0's key features?
Lumina Image 2.0 is designed around two principles: unification (text and image tokens processed jointly) and efficiency (competitive results from 2B parameters with fast inference). The feature set targets users who need a lightweight, open-source image model with strong compositional understanding and commercial licensing.
Unified Token Processing
Unlike models that use cross-attention to connect text and image, Lumina Image 2.0 treats text and image tokens as a single joint sequence. This enables natural cross-modal interactions during generation and makes the architecture easier to extend for additional tasks. The Unified Next-DiT framework supports both generation and understanding in one model.
Typography and Text Rendering
The model handles text-in-image generation with improved accuracy compared to its predecessor. Signs, labels, and typographic elements render more clearly. This is driven by the Gemma-2-2B text encoder, which provides stronger semantic understanding of what text should appear and where.
Complex Prompt Understanding
Lumina Image 2.0 excels at compositional prompts with multiple objects, spatial relationships, color attributes, and relational descriptions. On the DPG benchmark, it outperforms all compared models across Entity, Relation, Attribute, and Overall metrics. This is attributed to the Unified Captioner (UniCap), a custom captioning system that generates detailed, semantically aligned captions during training.
Multilingual Generation
The model accepts prompts in multiple languages without requiring phonemes or language-specific preprocessing. You can write prompts in English, Chinese, and other languages and the model generates appropriate images. This is supported by the Gemma-2-2B encoder's multilingual capabilities.
Efficient Inference
CFG-Renormalization and CFG-Truncation are integrated to reduce inference time without hurting image quality. Flash attention accelerates both training and inference. Multiple solver options (Midpoint, Euler, DPM) let you balance speed and fidelity based on your use case. The result is a model that runs faster than most comparably performing alternatives.
Apache 2.0 License
Full commercial rights. Model weights, training code, fine-tuning scripts, and LoRA training are all included. No usage restrictions, no API-only access, no research-only limitations. You can deploy it, modify it, and build products with it.
Extensible Ecosystem
The Lumina-Accessory framework extends the base model with image editing, identity preservation, controllable generation, and task-specific adaptation. The related Lumina-Video 1.0 project uses the same architectural foundation for video generation. LoRA fine-tuning scripts let you adapt the model for specific styles, subjects, or domains.
How does Lumina Image 2.0 compare to other image models?
Lumina Image 2.0 achieves competitive benchmark scores with 2B parameters where other models need 3-12B. It leads on DPG (87.20) for compositional understanding. FLUX leads on raw aesthetic quality and community adoption. SANA offers faster inference at similar parameter counts. Wan 2.7 adds thinking mode and 4K output but is not fully open source yet.
| Model | Parameters | DPG Score | License | Max Resolution |
|---|---|---|---|---|
| Lumina Image 2.0 | 2B | 87.20 | Apache 2.0 | 1024x1024 |
| FLUX.1 Dev | 12B | 83.5 | Non-commercial | Up to 2K |
| SANA | 1.6B | 83.6 | Apache 2.0 | 4096x4096 |
| SD3 Medium | 2B | 80.1 | Community License | 1024x1024 |
Source: Lumina-Image 2.0 Technical Report (arXiv:2503.21758), DPG and GenEval benchmarks, and model documentation. DPG scores for FLUX and SD3 from respective technical reports. Exact DPG scores may vary by evaluation configuration.
How does Lumina Image 2.0 work?
Lumina Image 2.0 is a flow-based diffusion transformer that combines the interpretability of flow models with the generative strength of diffusion processes. The Unified Next-DiT architecture processes text and image tokens in a single joint sequence, enabling tighter cross-modal alignment than architectures that treat text as a separate conditioning signal.
The Unified Captioner (UniCap) is a custom captioning system designed for text-to-image training. It generates detailed, semantically aligned captions that improve prompt adherence during generation. This is a key reason the model performs well on compositional benchmarks: the training data has higher-quality text-image alignment than most public datasets provide.
Training uses a three-stage progressive strategy. The first stage trains on a large, diverse dataset. The second stage narrows to higher-quality data. The third stage fine-tunes on a small set of the highest-quality examples. Performance improves steadily across all three stages on both DPG and GenEval benchmarks.
On Floyo, Lumina Image 2.0 runs through ComfyUI nodes on H100 NVL GPUs. Because the model is only 2B parameters, it runs efficiently and can be chained with other models in complex workflows. Generate with Lumina Image 2.0, then animate the output with Wan 2.7 or add voiceover with Fish Audio S2. All in one pipeline.
Frequently Asked Questions
Common questions about running Lumina Image 2.0 on Floyo.
You can try it with Floyo's free tier, which gives you 20 minutes of GPU time per day. Lumina Image 2.0 is open-source under Apache 2.0, so there is no additional API cost beyond Floyo's GPU time. No per-image charges.
Open Floyo in your browser, find a Lumina Image 2.0 workflow (search "Lumina" in the template library), and click Run. Floyo handles the GPU, the ComfyUI environment, and the model weights. No local install, no Python setup, no API key required.
Alpha-VLLM, a research group at the Shanghai AI Laboratory. The model was initially released January 25, 2025, with the full technical report published March 28, 2025. Collaborators include The University of Sydney, The Chinese University of Hong Kong, and Krea AI.
Lumina Image 2.0 scores higher on the DPG compositional understanding benchmark (87.20 vs. 83.5) with 6x fewer parameters (2B vs. 12B). FLUX has stronger community adoption, more LoRA options, higher max resolution, and broader aesthetic range. Lumina's key advantages are its Apache 2.0 license (FLUX Dev is non-commercial), its smaller size (faster inference, less VRAM), and stronger prompt adherence on complex compositions.
Yes. Both LoRA fine-tuning and full fine-tuning are supported. Training scripts are included in the open-source release. The Lumina-Accessory framework adds further capabilities for image editing, identity preservation, and task-specific adaptation. On Floyo, you can use fine-tuned LoRAs in your ComfyUI workflows.
Yes. Floyo runs ComfyUI, which lets you chain multiple models in a single workflow. Generate an image with Lumina Image 2.0, animate it with Wan 2.7 Video, add voiceover with Fish Audio S2, then upscale the final result. All in one pipeline, all in your browser.
Yes. The model is released under the Apache 2.0 license, which grants full commercial usage rights. You can use generated images in products, marketing, client work, and any other commercial context without additional licensing.
At 2 billion parameters, Lumina Image 2.0 runs faster than most comparably performing models. CFG-Renormalization and CFG-Truncation reduce inference time without hurting quality. On Floyo's H100 NVL GPUs, generation is fast enough for iterative workflows where you need to test multiple prompt variations.
Try Lumina Image 2.0 on Floyo
2B parameter open-source image generation with strong typography, compositional understanding, and Apache 2.0 licensing. Run it in your browser.
Related Reading
Setting Up an AI Production Pipeline for Your Studio
AI Ad Creatives for Social and Web
Last updated: April 2026. Specs from Lumina-Image 2.0 Technical Report (arXiv:2503.21758), Alpha-VLLM GitHub repository, HuggingFace model card, fal.ai model listing, and Open Laboratory model report.