⚡️- Image
An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
[](https://tongyi-mai.github.io/Z-Image-blog/)
[](https://github.com/Tongyi-MAI/Z-Image)
[](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
[](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo)
[](https://huggingface.co/spaces/akhaliq/Z-Image-Turbo)
[](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo)
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster)
[](assets/Z-Image-Gallery.pdf)
[](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)
Welcome to the official repository for the Z-Image(造相)project!
## ✨ Z-Image
Z-Image is a powerful and highly efficient image generation model family with **6B** parameters. Currently there are four variants:
- 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
- 🎨 **Z-Image** – The foundation model behind Z-Image-Turbo. Z-Image focuses on **high-quality generation**, **rich aesthetics**, **strong diversity**, and **controllability**, well-suited for creative generation, **fine-tuning**, and downstream development. It supports a wide range of artistic styles, effective negative prompting, and high diversity across identities, poses, compositions, and layouts.
- 🧱 **Z-Image-Omni-Base** – The versatile foundation model capable of both **generation and editing tasks**. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development, providing the most "raw" and diverse starting point for the open-source community.
- ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
### 📥 Model Zoo
| Model | Pre-Training | SFT | RL | Step | CFG | Task | Visual Quality | Diversity | Fine-Tunability | Hugging Face | ModelScope |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Z-Image-Omni-Base** | ✅ | ❌ | ❌ | 50 | ✅ | Gen. / Editing | Medium | High | Easy | *To be released* | *To be released* |
| **Z-Image** | ✅ | ✅ | ❌ | 50 | ✅ | Gen. | High | Medium | Easy | [](https://huggingface.co/Tongyi-MAI/Z-Image) [](https://huggingface.co/spaces/Tongyi-MAI/Z-Image) | [](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image)
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster) | | **Z-Image-Turbo** | ✅ | ✅ | ✅ | 8 | ❌ | Gen. | Very High | Low | N/A | [](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
[](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) | [](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo)
[](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster) | | **Z-Image-Edit** | ✅ | ✅ | ❌ | 50 | ✅ | Editing | High | Medium | Easy | *To be released* | *To be released* | | *To be released* | ### 🖼️ Showcase 📸 **Photorealistic Quality**: **Z-Image-Turbo** delivers strong photorealistic image generation while maintaining excellent aesthetic quality.  📖 **Accurate Bilingual Text Rendering**: **Z-Image-Turbo** excels at accurately rendering complex Chinese and English text.  💡 **Prompt Enhancing & Reasoning**: Prompt Enhancer empowers the model with reasoning capabilities, enabling it to transcend surface-level descriptions and tap into underlying world knowledge.  🧠 **Creative Image Editing**: **Z-Image-Edit** shows a strong understanding of bilingual editing instructions, enabling imaginative and flexible image transformations.  ### 🏗️ Model Architecture We adopt a **Scalable Single-Stream DiT** (S3-DiT) architecture. In this setup, text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency compared to dual-stream approaches.  ### 📈 Performance According to the Elo-based Human Preference Evaluation (on [*Alibaba AI Arena*](https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I)), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.

Click to view the full leaderboard
Click here for details for why you need to install diffusers from source
We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/12715)) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release. Therefore, you need to install diffusers from source for the latest features and Z-Image support.Description
Details
Downloads
6,379
Platform
Civision
Platform Status
Available
Created
12/3/2025
Updated
1/31/2026
Deleted
-
Files
transformer/diffusion_pytorch_model-00001-of-00003.safetensors
Size:
22.93 GB
SHA256:
f797da71de7dddedb9e3d7bcdb9d9a71be7bd5ceb9ddde7b7fb75d6fb7f5d3ad1de1ef35ef5d9fd79db9e796bcf3cd38Mirrors
Other Platforms (TensorArt, SeaArt, etc.) (1 mirrors)




