87939
2025-09-09
4
1.8k
Try Z-Image Turbo with this workflow that supports both Text-to-Image and Image-to-Image.
Besides resolution and number of images per batch (batch_size), these are the key inputs for controlling the output:
Positive Prompt: Write a detailed prompt describing your desired output.
Negative Prompt: Jot down things you don't want in your image.
Optional Image Input: Toggle this on to enable image-to-image and upload an image you’d like to transform.
Denoise: Controls how much the output differs from the original image when using image-to-image.
High values (e.g., 0.9) → More different from the original
Low values (e.g., 0.1) → Closer to the original
steps: Recommended default value of 8 gets all the quality you need for max speed.
Simply put, Z-Image Turbo is the new king of image generation.
I don't say this lightly, but I love this model. Let me explain why:
Since 2022 - ancient times in AI image generation world - we’ve been forced to choose between speed and quality. You either got a fast, mediocre image or you waited minutes for something usable. That era is over. The Floyo team and I have shifted our internal workflows to Z-Image Turbo, and if you are serious about AI image generation, you should too. Based purely on output quality - it is currently the number one open-source model, but it is also absurdly fast! (We're talking 8 inference steps, compares to 20 to 50 steps for other image models.)
The specs on this model are great, but the practical application is where this model really shines.
Zero Censorship: This is a fully open-source, uncensored model. There are no arbitrary guardrails blocking your creative freedom. It reminds me of the golden age of SD 1.5 - pure, unadulterated prompting.
Super Fast Generations: On Floyo, we are hitting generation times of a few seconds once the model is loaded.
Text & Prompt Adherence: It handles text inside images incredibly well and follows complex instructions without the "bleeding" artifacts we see in older architectures.
The Fun Factor: When you wait 3 minutes for a render (like with Flux 2), you hesitate to experiment. When a render takes 3 seconds, you iterate. You try wild ideas. You tweak text. You make jokes. Z-Image Turbo brings the joy back to prompting because there’s no penalty for “failing”.
Z-Image isn't just a standalone model - it is becoming a complete production suite for anything we need for image generation and editing. While the base model is already changing the game right now, the roadmap for the next few weeks will cement Z-Image as the undisputed king for all things image generation:
Z-Image Turbo - Text2Image: Available now - this workflow.
Z-Image Turbo - Image2Image: Available now - this workflow
Z-Image Turbo - ControlNet: Available now - check it out at the link!
Z-Image Edit: Coming soon. Allows prompt-based editing of images.
Z-Image Turbo - LoRA Training: Coming soon here on Floyo before end of Dec 2025.. We will have a full tutorial on Floyo for training consistent characters, objects, and styles!
We don't need anything else for image generation and editing; Z-Image has it all. It is becoming a key piece of every production workflow we run.
The community consensus is clear. It’s not just us raving about this model - it’s practically unanimous.
"I love this model, I'm speechless. It's the one we've all been waiting for... It's fast (3.4 seconds for 1280*800), powerful (painters and drawers styles etc.), lightweight compared to flux.2 and not censored." - u/Kaduc21
"The output is amazing for a 6b distilled model. Training a bunch of Loras and merging them with the base model would improve it a lot." - u/Shockbum
"It reminds Stable Diffusion 1.5 at the release, but better. Same freedom, no constraints." - u/Kaduc21
"WOW! SDXL SUCCESSOR!" - u/Shockbum
"Speed to aesthetic quality ratio is excellent." - u/abnormal_human
"The prompt following is incredible" - Link to example prompts on Reddit
Why is it so fast? Z-Image Turbo utilizes a Scalable Single-Stream DiT (S3-DiT) architecture.
Unlike older dual-stream models that process text and images separately, S3-DiT concatenates text, visual semantic tokens, and image VAE tokens into a single unified stream. It drastically improves parameter efficiency. Combined with Decoupled-DMD (Distribution Matching Distillation), the model is compressed to run in just 8 inference steps while maintaining high fidelity.
The Specs:
Parameters: 6 billion
VRAM Requirements: 16GB for local use (not needed on Floyo)
Inference Steps: 8 steps (vs. 20-50 for other models)
Generation Time: Sub-second on H100 GPUs, 8-30 seconds on consumer hardware
License: Apache-2.0 (fully open source)
Architecture: Single-Stream Diffusion Transformer (S3-DiT)
Model Creator: Alibaba Group
More details can be found at this Z-Image GitHub repository.
Read more
Try Z-Image Turbo with this workflow that supports both Text-to-Image and Image-to-Image.
Besides resolution and number of images per batch (batch_size), these are the key inputs for controlling the output:
Positive Prompt: Write a detailed prompt describing your desired output.
Negative Prompt: Jot down things you don't want in your image.
Optional Image Input: Toggle this on to enable image-to-image and upload an image you’d like to transform.
Denoise: Controls how much the output differs from the original image when using image-to-image.
High values (e.g., 0.9) → More different from the original
Low values (e.g., 0.1) → Closer to the original
steps: Recommended default value of 8 gets all the quality you need for max speed.
Simply put, Z-Image Turbo is the new king of image generation.
I don't say this lightly, but I love this model. Let me explain why:
Since 2022 - ancient times in AI image generation world - we’ve been forced to choose between speed and quality. You either got a fast, mediocre image or you waited minutes for something usable. That era is over. The Floyo team and I have shifted our internal workflows to Z-Image Turbo, and if you are serious about AI image generation, you should too. Based purely on output quality - it is currently the number one open-source model, but it is also absurdly fast! (We're talking 8 inference steps, compares to 20 to 50 steps for other image models.)
The specs on this model are great, but the practical application is where this model really shines.
Zero Censorship: This is a fully open-source, uncensored model. There are no arbitrary guardrails blocking your creative freedom. It reminds me of the golden age of SD 1.5 - pure, unadulterated prompting.
Super Fast Generations: On Floyo, we are hitting generation times of a few seconds once the model is loaded.
Text & Prompt Adherence: It handles text inside images incredibly well and follows complex instructions without the "bleeding" artifacts we see in older architectures.
The Fun Factor: When you wait 3 minutes for a render (like with Flux 2), you hesitate to experiment. When a render takes 3 seconds, you iterate. You try wild ideas. You tweak text. You make jokes. Z-Image Turbo brings the joy back to prompting because there’s no penalty for “failing”.
Z-Image isn't just a standalone model - it is becoming a complete production suite for anything we need for image generation and editing. While the base model is already changing the game right now, the roadmap for the next few weeks will cement Z-Image as the undisputed king for all things image generation:
Z-Image Turbo - Text2Image: Available now - this workflow.
Z-Image Turbo - Image2Image: Available now - this workflow
Z-Image Turbo - ControlNet: Available now - check it out at the link!
Z-Image Edit: Coming soon. Allows prompt-based editing of images.
Z-Image Turbo - LoRA Training: Coming soon here on Floyo before end of Dec 2025.. We will have a full tutorial on Floyo for training consistent characters, objects, and styles!
We don't need anything else for image generation and editing; Z-Image has it all. It is becoming a key piece of every production workflow we run.
The community consensus is clear. It’s not just us raving about this model - it’s practically unanimous.
"I love this model, I'm speechless. It's the one we've all been waiting for... It's fast (3.4 seconds for 1280*800), powerful (painters and drawers styles etc.), lightweight compared to flux.2 and not censored." - u/Kaduc21
"The output is amazing for a 6b distilled model. Training a bunch of Loras and merging them with the base model would improve it a lot." - u/Shockbum
"It reminds Stable Diffusion 1.5 at the release, but better. Same freedom, no constraints." - u/Kaduc21
"WOW! SDXL SUCCESSOR!" - u/Shockbum
"Speed to aesthetic quality ratio is excellent." - u/abnormal_human
"The prompt following is incredible" - Link to example prompts on Reddit
Why is it so fast? Z-Image Turbo utilizes a Scalable Single-Stream DiT (S3-DiT) architecture.
Unlike older dual-stream models that process text and images separately, S3-DiT concatenates text, visual semantic tokens, and image VAE tokens into a single unified stream. It drastically improves parameter efficiency. Combined with Decoupled-DMD (Distribution Matching Distillation), the model is compressed to run in just 8 inference steps while maintaining high fidelity.
The Specs:
Parameters: 6 billion
VRAM Requirements: 16GB for local use (not needed on Floyo)
Inference Steps: 8 steps (vs. 20-50 for other models)
Generation Time: Sub-second on H100 GPUs, 8-30 seconds on consumer hardware
License: Apache-2.0 (fully open source)
Architecture: Single-Stream Diffusion Transformer (S3-DiT)
Model Creator: Alibaba Group
More details can be found at this Z-Image GitHub repository.
Read more