Home / Model / Lumina Image 2.0 on Floyo

AI IMAGE GENERATION

Run Lumina Image 2.0 on Floyo

A 2 billion parameter flow-based diffusion transformer with strong typography, complex prompt understanding, and multilingual generation. Open source under Apache 2.0.

Run Alpha-VLLM's Lumina Image 2.0 through ComfyUI in your browser. No API key, no installs, no local GPU.

Parameters

Resolution

Up to 1024x1024

License

Apache 2.0

Architecture

Flow-based DiT

Run now on Floyo Browse All Models

No installation. Runs in browser. Updated April 2026.

What is Lumina Image 2.0?

Lumina Image 2.0 is a text-to-image model from Alpha-VLLM, initially released January 25, 2025, with the full technical report published March 28, 2025. It uses a 2 billion parameter flow-based diffusion transformer (Unified Next-DiT) that treats text and image tokens as a joint sequence. On the DPG benchmark, it scores 87.20. On GenEval, it scores 0.73. Both results are competitive with models that have significantly more parameters.

What are Lumina Image 2.0's technical specifications?

Lumina Image 2.0 uses a Unified Next-DiT architecture with 2 billion parameters. It pairs a Gemma-2-2B text encoder with a FLUX-VAE-16CH autoencoder. The model supports multiple inference solvers (Midpoint, Euler, DPM), CFG-Renormalization and CFG-Truncation for faster inference, and flash attention for efficient computation. Training used 200 million image-text pairs across three progressive stages.

Spec	Details
Developer	Alpha-VLLM (Shanghai AI Laboratory)
Architecture	Unified Next-DiT (flow-based diffusion transformer)
Parameters	2 billion
Text Encoder	Gemma-2-2B
VAE	FLUX-VAE-16CH
Max Resolution	1024x1024
Typography	Enhanced text rendering in generated images
Multilingual	Yes (no language-specific preprocessing required)
Inference Solvers	Midpoint, Euler, DPM Solver
Training Data	200M image-text pairs (3-stage progressive training)
DPG Score	87.20 (Overall)
GenEval Score	0.73
License	Apache 2.0 (full commercial rights)
Fine-Tuning	LoRA and full fine-tuning supported
ComfyUI Access	Native ComfyUI support + Diffusers library
Release Date	January 25, 2025 (tech report March 28, 2025)

What are Lumina Image 2.0's key features?

Lumina Image 2.0 is designed around two principles: unification (text and image tokens processed jointly) and efficiency (competitive results from 2B parameters with fast inference). The feature set targets users who need a lightweight, open-source image model with strong compositional understanding and commercial licensing.

Unified Token Processing

Unlike models that use cross-attention to connect text and image, Lumina Image 2.0 treats text and image tokens as a single joint sequence. This enables natural cross-modal interactions during generation and makes the architecture easier to extend for additional tasks. The Unified Next-DiT framework supports both generation and understanding in one model.

Typography and Text Rendering

The model handles text-in-image generation with improved accuracy compared to its predecessor. Signs, labels, and typographic elements render more clearly. This is driven by the Gemma-2-2B text encoder, which provides stronger semantic understanding of what text should appear and where.

Complex Prompt Understanding

Lumina Image 2.0 excels at compositional prompts with multiple objects, spatial relationships, color attributes, and relational descriptions. On the DPG benchmark, it outperforms all compared models across Entity, Relation, Attribute, and Overall metrics. This is attributed to the Unified Captioner (UniCap), a custom captioning system that generates detailed, semantically aligned captions during training.

Multilingual Generation

The model accepts prompts in multiple languages without requiring phonemes or language-specific preprocessing. You can write prompts in English, Chinese, and other languages and the model generates appropriate images. This is supported by the Gemma-2-2B encoder's multilingual capabilities.

Efficient Inference

CFG-Renormalization and CFG-Truncation are integrated to reduce inference time without hurting image quality. Flash attention accelerates both training and inference. Multiple solver options (Midpoint, Euler, DPM) let you balance speed and fidelity based on your use case. The result is a model that runs faster than most comparably performing alternatives.

Apache 2.0 License

Full commercial rights. Model weights, training code, fine-tuning scripts, and LoRA training are all included. No usage restrictions, no API-only access, no research-only limitations. You can deploy it, modify it, and build products with it.

Extensible Ecosystem

The Lumina-Accessory framework extends the base model with image editing, identity preservation, controllable generation, and task-specific adaptation. The related Lumina-Video 1.0 project uses the same architectural foundation for video generation. LoRA fine-tuning scripts let you adapt the model for specific styles, subjects, or domains.

How does Lumina Image 2.0 compare to other image models?

Lumina Image 2.0 achieves competitive benchmark scores with 2B parameters where other models need 3-12B. It leads on DPG (87.20) for compositional understanding. FLUX leads on raw aesthetic quality and community adoption. SANA offers faster inference at similar parameter counts. Wan 2.7 adds thinking mode and 4K output but is not fully open source yet.

Model	Parameters	DPG Score	License	Max Resolution
Lumina Image 2.0	2B	87.20	Apache 2.0	1024x1024
FLUX.1 Dev	12B	83.5	Non-commercial	Up to 2K
SANA	1.6B	83.6	Apache 2.0	4096x4096
SD3 Medium	2B	80.1	Community License	1024x1024

Source: Lumina-Image 2.0 Technical Report (arXiv:2503.21758), DPG and GenEval benchmarks, and model documentation. DPG scores for FLUX and SD3 from respective technical reports. Exact DPG scores may vary by evaluation configuration.

How does Lumina Image 2.0 work?

Lumina Image 2.0 is a flow-based diffusion transformer that combines the interpretability of flow models with the generative strength of diffusion processes. The Unified Next-DiT architecture processes text and image tokens in a single joint sequence, enabling tighter cross-modal alignment than architectures that treat text as a separate conditioning signal.

The Unified Captioner (UniCap) is a custom captioning system designed for text-to-image training. It generates detailed, semantically aligned captions that improve prompt adherence during generation. This is a key reason the model performs well on compositional benchmarks: the training data has higher-quality text-image alignment than most public datasets provide.

Training uses a three-stage progressive strategy. The first stage trains on a large, diverse dataset. The second stage narrows to higher-quality data. The third stage fine-tunes on a small set of the highest-quality examples. Performance improves steadily across all three stages on both DPG and GenEval benchmarks.

On Floyo, Lumina Image 2.0 runs through ComfyUI nodes on H100 NVL GPUs. Because the model is only 2B parameters, it runs efficiently and can be chained with other models in complex workflows. Generate with Lumina Image 2.0, then animate the output with Wan 2.7 or add voiceover with Fish Audio S2. All in one pipeline.

Frequently Asked Questions

Common questions about running Lumina Image 2.0 on Floyo.

Is Lumina Image 2.0 free to use on Floyo?

You can try it with Floyo's free tier, which gives you 20 minutes of GPU time per day. Lumina Image 2.0 is open-source under Apache 2.0, so there is no additional API cost beyond Floyo's GPU time. No per-image charges.

How do I run Lumina Image 2.0 without installing anything?

Open Floyo in your browser, find a Lumina Image 2.0 workflow (search "Lumina" in the template library), and click Run. Floyo handles the GPU, the ComfyUI environment, and the model weights. No local install, no Python setup, no API key required.

Who made Lumina Image 2.0?

Alpha-VLLM, a research group at the Shanghai AI Laboratory. The model was initially released January 25, 2025, with the full technical report published March 28, 2025. Collaborators include The University of Sydney, The Chinese University of Hong Kong, and Krea AI.

How does Lumina Image 2.0 compare to FLUX?

Lumina Image 2.0 scores higher on the DPG compositional understanding benchmark (87.20 vs. 83.5) with 6x fewer parameters (2B vs. 12B). FLUX has stronger community adoption, more LoRA options, higher max resolution, and broader aesthetic range. Lumina's key advantages are its Apache 2.0 license (FLUX Dev is non-commercial), its smaller size (faster inference, less VRAM), and stronger prompt adherence on complex compositions.

Can I fine-tune Lumina Image 2.0?

Yes. Both LoRA fine-tuning and full fine-tuning are supported. Training scripts are included in the open-source release. The Lumina-Accessory framework adds further capabilities for image editing, identity preservation, and task-specific adaptation. On Floyo, you can use fine-tuned LoRAs in your ComfyUI workflows.

Can I combine Lumina Image 2.0 with other AI models in one workflow?

Yes. Floyo runs ComfyUI, which lets you chain multiple models in a single workflow. Generate an image with Lumina Image 2.0, animate it with Wan 2.7 Video, add voiceover with Fish Audio S2, then upscale the final result. All in one pipeline, all in your browser.

Can I use Lumina Image 2.0 output commercially?

Yes. The model is released under the Apache 2.0 license, which grants full commercial usage rights. You can use generated images in products, marketing, client work, and any other commercial context without additional licensing.

How fast is Lumina Image 2.0?

At 2 billion parameters, Lumina Image 2.0 runs faster than most comparably performing models. CFG-Renormalization and CFG-Truncation reduce inference time without hurting quality. On Floyo's H100 NVL GPUs, generation is fast enough for iterative workflows where you need to test multiple prompt variations.

Try Lumina Image 2.0 on Floyo

2B parameter open-source image generation with strong typography, compositional understanding, and Apache 2.0 licensing. Run it in your browser.

Try Lumina Image 2.0 Now → Browse All Models