floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

LongCat for Text to Image

Create cool images using the LongCat

56

Generates in about 10 secs

Nodes & Models

CLIPLoader
qwen_2.5_vl_7b_fp8_scaled.safetensors
VAELoader
ae.safetensors
UNETLoader
longcat_image_bf16.safetensors
ResolutionSelector
WorkflowGraphics
CLIPTextEncode
CFGNorm
EmptySD3LatentImage
FluxGuidance
KSampler
VAEDecode
AddLabel
PreviewImage
easy positive

LongCat‑Image is a 6B‑parameter text‑to‑image (and image‑edit) foundation model focused on photorealism and extremely accurate English/Chinese text rendering in images.

What it is

  • An open‑source model from Meituan that generates and edits images from text prompts, using a Flux‑style diffusion backbone plus a Qwen2.5‑VL text encoder.

  • Built to be smaller and faster than many flagship models while still matching or beating them on realism and text‑in‑image benchmarks.

Key features

  • Strong photorealism and material rendering (skin, fabric, lighting) for commercial‑quality images.

  • Bilingual text rendering: handles Chinese and English text in images with high spelling accuracy by treating quoted text at character level.

  • Unified generation + editing: a paired LongCat‑Image‑Edit variant supports precise inpainting/outpainting and instruction‑based edits while preserving structure and identity.

  • Efficient 6B architecture yields fast inference and lower VRAM use than SDXL/Flux‑class models, especially in cloud or optimized runtimes.

Best use cases

  • Posters, ads, and UI mockups that need clean layout plus correctly spelled Chinese/English text (titles, buttons, labels, signage).

  • Photoreal product and lifestyle images for marketing, where realism and brand‑safe detail matter.

  • Image editing tasks like background changes, object insertion/removal, or text replacement in existing visuals using natural‑language instructions.

Read more

N