floyo logo
Powered by
ThinkDiffusion
floyo logo
Powered by
ThinkDiffusion

FLUX.2 Klein 9B for Text to Image

Create a high quality image using 9B model of Flux 2 Klein

507

FLUX.2 [klein] 9B is a 9‑billion‑parameter rectified‑flow image model that does both text‑to‑image and image editing in one architecture, optimized for very fast, high‑quality generations on a single GPU.

Overview

Klein 9B is built as a compact “flagship small” model: it uses a 9B flow transformer plus an 8B Qwen3 text encoder and is step‑distilled so it can generate images in as few as 4 inference steps. It supports text‑to‑image, image‑to‑image, and multi‑reference editing (up to several reference images) using the same checkpoint, so you do not need separate models for generation and editing.

Why it matters

  • Speed + quality: The distilled variants can reach sub‑second to ~2‑second generation on modern consumer GPUs while matching or beating much larger models on prompt fidelity and detail.

  • Unified workflows: Because the same model handles text‑to‑image and editing, it is well suited for interactive tools, ComfyUI graphs, and apps where users move fluidly between generating and tweaking.

  • Multi‑reference strength: It can blend up to around 4–5 reference images to keep character identity, product appearance, or style consistent across many outputs.

Model variants (text‑to‑image use)

  • 9B Distilled

    • 4‑step, latency‑optimized; “sub‑second” on high‑end cards; ideal for real‑time or high‑volume use.

  • 9B Base (undistilled)

    • Full‑capacity foundation model with more steps; better for fine‑tuning, LoRA training, and maximum diversity/control when speed is less critical.

Typical text‑to‑image behavior

  • Handles complex compositions with realistic lighting, correct perspective, and coherent spatial relationships.

  • Produces photorealistic or stylized images at resolutions up to roughly 1024×1024 or about 4 megapixels, depending on deployment settings.​​

  • Adheres closely to prompts, including multi‑object scenes and layout‑like instructions, while keeping outputs diverse across seeds.

Use cases

  • Interactive concepting and UI‑driven tools where users expect near‑instant text‑to‑image responses.

  • Character, product, or brand exploration using multi‑reference generation to keep key visual elements consistent.

  • Pipelines that combine generation and refinement (for example, initial design → localized edit → variant sets) using a single model instead of switching between several.

Read more

N
Generates in about 28 secs

Nodes & Models

WorkflowGraphics
KSamplerSelect
Flux2Scheduler
RandomNoise
CLIPLoader
qwen_3_8b_fp8mixed.safetensors
VAELoader
flux2-vae.safetensors
UNETLoader
flux-2-klein-9b.safetensors
EmptyFlux2LatentImage
CLIPTextEncode
CFGGuider
SamplerCustomAdvanced
VAEDecode
SaveImage

FLUX.2 [klein] 9B is a 9‑billion‑parameter rectified‑flow image model that does both text‑to‑image and image editing in one architecture, optimized for very fast, high‑quality generations on a single GPU.

Overview

Klein 9B is built as a compact “flagship small” model: it uses a 9B flow transformer plus an 8B Qwen3 text encoder and is step‑distilled so it can generate images in as few as 4 inference steps. It supports text‑to‑image, image‑to‑image, and multi‑reference editing (up to several reference images) using the same checkpoint, so you do not need separate models for generation and editing.

Why it matters

  • Speed + quality: The distilled variants can reach sub‑second to ~2‑second generation on modern consumer GPUs while matching or beating much larger models on prompt fidelity and detail.

  • Unified workflows: Because the same model handles text‑to‑image and editing, it is well suited for interactive tools, ComfyUI graphs, and apps where users move fluidly between generating and tweaking.

  • Multi‑reference strength: It can blend up to around 4–5 reference images to keep character identity, product appearance, or style consistent across many outputs.

Model variants (text‑to‑image use)

  • 9B Distilled

    • 4‑step, latency‑optimized; “sub‑second” on high‑end cards; ideal for real‑time or high‑volume use.

  • 9B Base (undistilled)

    • Full‑capacity foundation model with more steps; better for fine‑tuning, LoRA training, and maximum diversity/control when speed is less critical.

Typical text‑to‑image behavior

  • Handles complex compositions with realistic lighting, correct perspective, and coherent spatial relationships.

  • Produces photorealistic or stylized images at resolutions up to roughly 1024×1024 or about 4 megapixels, depending on deployment settings.​​

  • Adheres closely to prompts, including multi‑object scenes and layout‑like instructions, while keeping outputs diverse across seeds.

Use cases

  • Interactive concepting and UI‑driven tools where users expect near‑instant text‑to‑image responses.

  • Character, product, or brand exploration using multi‑reference generation to keep key visual elements consistent.

  • Pipelines that combine generation and refinement (for example, initial design → localized edit → variant sets) using a single model instead of switching between several.

Read more

N