
COMMUNITY PAGE
Run Longcat on Floyo
Home / Model / LongCat on Floyo
AI IMAGE GENERATION & EDITING
Run LongCat on Floyo
Meituan's 6B parameter bilingual image model with industry-leading Chinese text rendering, photorealistic output, and instruction-based editing. Outperforms models 3-4x its size. Open source.
Run Meituan's LongCat Image through ComfyUI in your browser. No API key, no installs, no local GPU.
|
Parameters 6B |
Text Rendering Chinese + English (SOTA) |
|
Modes Generate + Edit |
Benchmark #2 open-source (T2I-CoreBench) |
| Try LongCat Now → | Browse All Models |
No installation. Runs in browser. Updated April 2026.












What to get?
LongCat Image is Meituan's 6B parameter open-source bilingual (Chinese-English) foundation model for image generation and editing. It uses a hybrid MM-DiT and Single-DiT architecture with a Qwen2.5-VL-7B text/vision encoder. Ranks #2 among all open-source models on T2I-CoreBench (surpassed only by the 32B FLUX2.dev). Industry-leading Chinese text rendering with superior accuracy on common and rare characters. Photorealistic output that rivals 20B+ parameter competitors. Generation and editing share the same architecture. Includes a 10x faster Edit-Turbo distilled variant. Available as ComfyUI nodes on Floyo.
LONGCAT WORKFLOWS ON FLOYO
What is LongCat?
LongCat Image is a 6B parameter open-source text-to-image and image editing model from Meituan, one of China's largest technology companies. Released December 5, 2025, with the technical report published December 8, 2025. It is designed as a bilingual (Chinese-English) foundation model that solves three problems most open-source models struggle with: accurate multilingual text rendering, photorealism at small parameter counts, and unified generation and editing in one architecture.
With only 6B parameters, LongCat outperforms models 2-4x its size. On T2I-CoreBench, it ranks #2 among all open-source models, surpassed only by the 32B-parameter FLUX2.dev. It beats Qwen-Image-20B and HunyuanImage-3.0 (80B parameters) on text rendering and photorealism benchmarks. This efficiency comes from the hybrid MM-DiT architecture and a training pipeline with three progressive stages plus RLHF alignment.
Chinese text rendering is LongCat's strongest differentiator. Most image models garble non-Latin scripts. LongCat renders common Chinese characters with high accuracy and achieves industry-leading coverage of the Chinese dictionary, including rare and complex characters. English text rendering is also strong. The Qwen2.5-VL-7B encoder provides deep understanding of both languages.
The editing variant (LongCat-Image-Edit) uses the same architecture for instruction-based editing. Describe what you want to change in natural language, and the model applies the edit while preserving composition and lighting. The Edit-Turbo variant distills this to 10x speed. Editing consistency across multiple rounds is a specific design goal.
On Floyo, LongCat runs through native ComfyUI nodes on H100 NVL GPUs. Two workflows cover text-to-image generation and instruction-based image editing. No model downloads, no local setup.
What can you create with LongCat?
LongCat covers text-to-image generation, instruction-based image editing, bilingual poster and banner design, product photography with embedded text, UI mockups, marketing assets with Chinese and English copy, and multi-round iterative editing with consistent lighting and textures. The model is designed for production use where text accuracy and photorealism both matter.
| Capability | What It Does | Use Case |
|---|---|---|
| Chinese Text Rendering | Industry-leading accuracy for common and rare Chinese characters. Stable rendering of complex typography, signs, and calligraphy. | Chinese marketing, bilingual posters, signage, menus |
| Photorealistic Generation | Generates images with believable lighting, depth, and textures that rival 20B+ parameter models despite being only 6B parameters. | Product photography, editorial images, hero images |
| Instruction Editing | Describe edits in natural language. The model applies them while preserving composition, lighting, and texture consistency across rounds. | Client revisions, iterative design, post-production |
| Bilingual Prompting | Write prompts in Chinese, English, or mixed language. The Qwen2.5-VL-7B encoder understands both natively. | Multilingual teams, localized content, cross-market assets |
| Poster and Banner Design | Generate production-ready posters with embedded text. Wrap in-image copy in double quotes for best results. | Ad creatives, event banners, social graphics, e-commerce |
| Pipeline Integration | Chain with video models in ComfyUI. Generate with LongCat, animate with Wan 2.7, add voiceover with Fish Audio S2. Or use LongCat Edit to modify outputs from other image models. | Multi-model workflows, end-to-end production |
What are LongCat's key features?
LongCat's feature set targets a specific gap: most open-source image models produce nice pictures but fail at text rendering, especially non-Latin scripts. LongCat was designed from the ground up for bilingual text accuracy alongside photorealism. The unified generation-and-editing architecture means you don't need separate models for creation and revision.
Industry-Leading Chinese Text Rendering
LongCat renders common Chinese characters with superior accuracy and stability compared to all other open-source models. Its dictionary coverage extends to rare and complex characters that most models cannot handle at all. This comes from the Qwen2.5-VL-7B encoder, which understands Chinese text at a semantic level, plus a training pipeline specifically designed to optimize text rendering quality.
6B Parameter Efficiency
At 6B parameters, LongCat is significantly smaller than competitors like Qwen-Image (20B) and HunyuanImage 3.0 (80B MoE). It still outperforms them on text rendering and photorealism benchmarks. This means lower VRAM, faster inference, and reduced deployment costs. On T2I-CoreBench, it ranks #2 among all open-source models, behind only the 32B FLUX2.dev.
Unified Generation and Editing
The same hybrid MM-DiT architecture powers both text-to-image generation and instruction-based editing. The Qwen2.5-VL-7B encoder provides a unified conditional space that handles both tasks. Generate an image, then edit it with natural language instructions. Lighting, textures, and composition stay consistent across edit rounds.
Edit-Turbo (10x Speed)
Released February 3, 2026, Edit-Turbo is the distilled version of LongCat-Image-Edit. It achieves a 10x speedup while maintaining the editing quality of the full model. For iterative workflows where you need fast turnaround on client revisions, this is the variant to use.
RLHF-Aligned Quality
Training uses curated reward models during the RL phase to align outputs with human aesthetic preferences. This is on top of the three-stage training pipeline (pre-training on diverse data, mid-training on higher-quality data, SFT on the highest-quality examples). The result is photorealistic output with strong instruction adherence.
Most Comprehensive Open-Source Ecosystem
Meituan releases not just model weights but the entire training pipeline: pre-training, mid-training, post-training checkpoints, and the full toolchain. This is the most complete open-source release in the image generation space. Researchers can reproduce results, modify training, and extend the model with full visibility into how it was built.
How does LongCat compare to other image models?
LongCat ranks #2 on T2I-CoreBench with 6B parameters, behind only the 32B FLUX2.dev. It leads all open-source models on Chinese text rendering. Z-Image Turbo leads on inference speed (8 steps). Qwen-Image leads on raw parameter scale (20B). FLUX Kontext leads on the editing ecosystem. LongCat's edge: best Chinese text rendering, strongest efficiency-to-quality ratio, and the most complete open-source release.
| Model | Parameters | Chinese Text | Edit Mode | T2I-CoreBench |
|---|---|---|---|---|
| LongCat | 6B | SOTA (common + rare) | Yes (+ Edit-Turbo) | #2 open-source |
| FLUX2.dev | 32B | Moderate | Via Kontext | #1 open-source |
| Z-Image Turbo | 6B | Good (EN + CN) | No | High |
| Qwen-Image | 20B | Strong | Via Qwen-Edit | High |
Source: LongCat-Image Technical Report (arXiv:2512.07584), T2I-CoreBench results (December 2025), Meituan GitHub, and third-party benchmark comparisons as of April 2026.
How does LongCat work?
LongCat uses a hybrid MM-DiT and Single-DiT architecture, consistent with FLUX, paired with the Qwen2.5-VL-7B vision-language model as its text encoder. The VLM encoder provides a unified conditional space that handles both generation (from text) and editing (from text + image) in the same architecture. This dual-use design is what lets one model serve both tasks.
The training pipeline has four stages. Pre-training on a large, diverse dataset establishes the model's foundational understanding. Mid-training narrows to higher-quality data. Supervised fine-tuning (SFT) focuses on the highest-quality examples. Finally, RLHF alignment uses curated reward models to push outputs toward human aesthetic preferences, photorealism, and text accuracy.
Chinese text rendering quality comes from two sources. The Qwen2.5-VL-7B encoder understands Chinese characters at a semantic level, not as pixel patterns. And the training data includes heavily curated examples of Chinese typography with progressively tighter quality filters across the three training stages. The combination produces consistent character rendering that other models achieve only inconsistently.
On Floyo, LongCat runs through native ComfyUI nodes on H100 NVL GPUs. The text-to-image workflow generates images from prompts. The editing workflow takes a source image plus an instruction and applies the modification. Both share the same model weights. You can chain LongCat with other ComfyUI nodes in the same workflow for complete production pipelines.
Frequently Asked Questions
Common questions about running LongCat on Floyo.
You can start with Floyo's free pricing plan. To continue using the service beyond the free tier, upgrade your Floyo pricing plan. LongCat is open-source, so there is no additional API cost beyond your Floyo plan.
Open Floyo in your browser, search "LongCat" in the template library, and pick the text-to-image or image editing workflow. Click Run, write your prompt, and generate. Floyo handles the GPU, ComfyUI environment, and model weights. No local install, no Python setup.
Meituan's LongCat Team. Meituan is one of China's largest technology companies. LongCat-Image weights were released December 5, 2025. The technical report was published December 8, 2025 on arXiv (2512.07584). Edit-Turbo (10x faster editing) was released February 3, 2026. Full Diffusers support was added December 16, 2025.
Both are 6B parameter models with strong efficiency. Z-Image Turbo leads on speed (8-step inference, sub-second on enterprise GPUs). LongCat leads on Chinese text rendering accuracy and has a dedicated editing variant. Z-Image Turbo is Apache 2.0 licensed. Both are available on Floyo and can be used in the same pipeline for different tasks.
Yes. This is LongCat's strongest feature. It achieves industry-leading accuracy and dictionary coverage for Chinese characters, including rare and complex ones. Wrap in-image text in double quotes for best results (e.g., a neon sign that reads "开放"). English text rendering is also strong.
Yes. Floyo runs ComfyUI, which lets you chain multiple models. Generate with LongCat, animate with Wan 2.7 or Kling Omni, add voiceover with Fish Audio S2 or Chatterbox. Or generate with another image model and refine with LongCat Edit. All in one pipeline.
LongCat weights and training code are open source. Check the specific license on the HuggingFace model card for commercial usage terms. The model was built on FLUX-style architecture and uses the Qwen2.5-VL encoder, so downstream license obligations may apply. Review the license before commercial deployment.
Edit-Turbo is the distilled version of LongCat-Image-Edit, released February 3, 2026. It achieves a 10x speedup while maintaining the editing quality of the full model. For iterative workflows where fast turnaround matters, Edit-Turbo is the recommended variant.
Try LongCat on Floyo
6B parameter bilingual image generation and editing with industry-leading Chinese text rendering and photorealistic output. Run it in your browser.
| Try LongCat Now → | Browse All Models |
Related Reading
AI Ad Creatives for Social and Web
Character and Concept Design on Floyo
Last updated: April 2026. Specs from LongCat-Image Technical Report (arXiv:2512.07584), Meituan GitHub (meituan-longcat/LongCat-Image), T2I-CoreBench results, HuggingFace model cards, WaveSpeedAI documentation, and fal.ai model listings.
LongCat for Text to Image
Create cool images using the LongCat
concept art
consistency
image to image
longcat-image-edit
portrait
style transfer
Upload one image, write an instruction, and LongCat-Image-Edit rewrites the parts you describe while keeping the rest identical. Bilingual prompts, 8 steps.
LongCat-Image-Edit - Instruction Image Editing
Upload one image, write an instruction, and LongCat-Image-Edit rewrites the parts you describe while keeping the rest identical. Bilingual prompts, 8 steps.

