Z-Anime - Text Encoder

NSFW

🎌 Z-Anime | Full Anime Fine-Tune on Z-Image Base

Full Fine-Tune • Rich Aesthetics • Strong Diversity • Full Negative Prompt Support

BF16 & FP8 & GGUF & AIO • Natural Language Prompts • 8GB VRAM

🤗 Now also on Hugging Face: huggingface.co/SeeSee21/Z-Anime — including the full Diffusers folder for ZImagePipeline.from_pretrained() use.

✨ What is Z-Anime?

Z-Anime is a full fine-tune of Alibaba's Z-Image (Base) architecture — not a LoRA merge, but a completely retrained model optimized for anime aesthetics from the ground up.

Built on the S3-DiT (Single-Stream Diffusion Transformer) with 6 billion parameters, Z-Anime inherits everything that makes Z-Image Base special: rich diversity, strong controllability, full negative prompt support and a high ceiling for fine-tuning — now fully tuned for anime.

This page contains the complete Z-Anime family:

🎌 Z-Anime Base — Full quality, full control, full creativity
⚡ Z-Anime Distill-8-Step — Great results in 8 steps
🚀 Z-Anime Distill-4-Step — Maximum speed, 4 steps
📦 GGUF Variants — Q8_0 + Q4_K_S for low VRAM / CPU / AMD
📦 AIO Variants — All-in-one checkpoints (Base + 4-Step + 8-Step)

Each main variant is available in BF16 (~12 GB) and FP8 (~6 GB).

🎯 Key Features

✅ Full fine-tune on Z-Image Base — not a LoRA merge
✅ Rich anime aesthetics with strong style diversity
✅ Natural language prompts — detailed descriptions, not tag lists
✅ High diversity across characters, poses, compositions and layouts
✅ LoRA training ready — perfect base for further fine-tuning
✅ Partially NSFW capable
✅ 8 GB VRAM compatible
✅ All variants supported by the official Z-Anime ComfyUI Workflow

🗺️ Z-Anime Roadmap

✅ Released

🎌 Z-Anime Base — Full fine-tune on Z-Image Base, BF16 & FP8

⚡ Z-Anime Distill-8-Step — fast anime generation in 8 steps, CFG 1.0, BF16 & FP8

🚀 Z-Anime Distill-4-Step — ultra-fast anime generation in 4 steps, CFG 1.0, BF16 & FP8

📦 GGUF Variants — for low VRAM and AMD GPUs. Since CivitAI currently has no dedicated GGUF category, here is what the files represent:

Z-Anime-Base-Q8_0 = Pruned Model FP8 (6.73 GB)
Z-Anime-Base-Q4_K_S = Pruned Model NF4 (4.2 GB)

📦 AIO Versions — All variants with VAE + Text Encoder integrated in a single file:

z-anime-base-aio (BF16 + FP8)
z-anime-distill-8step-aio (BF16 + FP8)
z-anime-distill-4step-aio (BF16 + FP8)

🔧 Z-Anime ComfyUI Workflow — Official workflow, supports all variants (auto-detects Diffusion / GGUF / AIO loaders, optional LoRA, optional 1.5× upscale)

🤗 Hugging Face Repo — full mirror including the Diffusers folder for Python users: huggingface.co/SeeSee21/Z-Anime

More updates coming — follow to stay notified! 🎌

📦 Versions Overview

🟢 BF16 (~12 GB)

Maximum precision. BFloat16 format, no quality compromise. Best for professional or commercial work and LoRA training. Still runs on 8 GB VRAM.

🟡 FP8 (~6 GB)

Recommended for most users. Half the file size, much faster downloads. Excellent quality, barely distinguishable from BF16. Perfect for everyday use and testing.

🔵 GGUF

Optimized for lightweight inference setups, especially useful for low VRAM, CPU inference, or alternative backends.

🟣 AIO

All-in-one checkpoints with image model + Text Encoder + VAE integrated into a single file. Single-file convenience, no extra loaders needed.

🎌 Z-Anime Base

The foundation of the Z-Anime family. A full fine-tune with the highest quality ceiling, the widest creative range and full negative prompt support.

Recommended Settings:

Steps:      28–50
CFG:        3.0–5.0  (up to 9.0 possible)
Sampler:    euler_ancestral
Scheduler:  beta
Negative:   strongly recommended — very responsive!

CFG Guide: 3.0–5.0 is the sweet spot for balanced quality and creativity. 5.0–7.0 gives tighter prompt adherence. 7.0–9.0 is for maximum control — watch for over-saturation. Above 9.0 is not recommended.

Negative prompts have full effect on Z-Anime Base. The official workflow ships with an optimized negative prompt ready to use.

⚡ Z-Anime Distill-8-Step

The sweet spot of the family. Distilled from Z-Anime Base, delivering strong anime results in just 8 steps. Much faster than Base while keeping most of the quality intact.

Recommended Settings:

Steps:      8
CFG:        1.0  (max ~1.5)
Sampler:    euler_ancestral
Scheduler:  beta
Negative:   limited effect

CFG Guide: Runs best at CFG 1.0 by design. Small nudges up to 1.3–1.5 are possible for slightly tighter prompt adherence. Do not go above 1.5 — artifacts may appear.

Negative prompts have limited effect at this distillation level. Use ConditioningZeroOut (included in the workflow) instead of writing a full negative prompt.

🚀 Z-Anime Distill-4-Step

The fastest Z-Anime variant. Built for maximum throughput — rapid prototyping, batch generation and situations where speed matters most.

Recommended Settings:

Steps:      4
CFG:        1.0  (max ~1.5)
Sampler:    euler_ancestral
Scheduler:  beta
Negative:   limited effect

CFG Guide: At 4 steps the model has very little correction room. Stay at CFG 1.0 for the most stable results. Nudging up to 1.3–1.5 is possible but increases instability. Do not go above 1.5.

Tips for 4-Step: Be specific and front-load the most important details early in your prompt. The optional upscaler (hires fix or SeedVR2) in the workflow is especially useful here to recover fine detail.

📐 Resolution Guide

| Use Case | Resolution | |---|---| | ⭐ Portrait / Character art | 832 × 1216 | | Landscape / Scenes / Backgrounds | 1216 × 832 | | Square / General purpose | 1024 × 1024 | | Tall / Full body / Phone wallpaper | 768 × 1344 | | Cinematic / Wide scenes | 1920 × 1088 | | High quality / Detailed portraits | 1024 × 1536 |

Supported range: 512 × 512 to 2048 × 2048, any aspect ratio. All resolutions run on 8 GB VRAM.

💡 Prompting Guide

Natural language — not tag lists!

✅ Good

A young anime girl with long silver hair and golden eyes, wearing a
traditional shrine maiden outfit with white haori and red hakama.
She stands in a sunlit bamboo forest, cherry blossoms falling softly
around her. Warm afternoon light filtering through the trees,
detailed fabric shading, expressive face, calm serene expression.
High quality anime illustration with fine line work.

❌ Avoid

anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light

Character portraits

Detailed anime portrait of [character], soft rim lighting,
expressive eyes with detailed reflections, fine hair strands,
clean linework, professional anime illustration quality.

Action scenes

Dynamic anime [scene], dramatic angle, motion energy, speed lines,
particle effects, cinematic composition, detailed shading,
high quality anime art.

Backgrounds & landscapes

Anime [location] at [time of day], [lighting], [atmosphere],
Studio Ghibli inspired detail level, beautiful background art,
wallpaper quality.

🔧 Installation

Step 1 — Download your version (BF16, FP8, GGUF or AIO) for the variant you want.

Step 2 — Place the files:

Standard BF16 / FP8 models:

ComfyUI/models/diffusion_models/
├── z-anime-base-bf16.safetensors
├── z-anime-base-fp8.safetensors
├── z-anime-distill-8step-bf16.safetensors
├── z-anime-distill-8step-fp8.safetensors
├── z-anime-distill-4step-bf16.safetensors
└── z-anime-distill-4step-fp8.safetensors

GGUF variants:

ComfyUI/models/unet/
├── z-anime-base-q8_0.gguf
└── z-anime-base-q4_k_s.gguf

Text Encoder & VAE (for the non-AIO variants):

ComfyUI/models/clip/
└── qwen_3_4b.safetensors

ComfyUI/models/vae/
└── ae.safetensors

AIO variants — single file, no extras needed:

ComfyUI/models/checkpoints/
├── z-anime-base-aio-bf16.safetensors
├── z-anime-base-aio-fp8.safetensors
├── z-anime-distill-8step-aio-bf16.safetensors
├── z-anime-distill-8step-aio-fp8.safetensors
├── z-anime-distill-4step-aio-bf16.safetensors
└── z-anime-distill-4step-aio-fp8.safetensors

Step 3 — Load in ComfyUI:

Use the Load Diffusion Model node for the model file, a CLIPLoader for the text encoder and a VAELoader for the VAE.
For the GGUF versions: load the GGUF model from the models/unet/ folder, use the same CLIP and VAE files as above.
For the AIO versions: just use a standard Checkpoint Loader — no extra CLIP or VAE loading required.
Or use the official Z-Anime ComfyUI Workflow — it handles all variants and precisions with a built-in model switch.

📦 Custom Nodes (for the official workflow)

rgthree-comfy
ComfyUI-Lora-Manager
ComfyUI-GGUF (only for the GGUF variants)
ComfyUI-SeedVR2_VideoUpscaler (optional, only for SeedVR2 upscale)

🤗 Hugging Face Repo

The complete model family is also mirrored on Hugging Face:

🔗 huggingface.co/SeeSee21/Z-Anime

The HF repo additionally contains:

The full Diffusers-format folder (diffusers/) — drop-in compatible with ZImagePipeline.from_pretrained() for Python users
An alternative Text Encoder by BennyDaBall — Engineer V4 (full fine-tune of the Z-Image text encoder with SMART training, drop-in compatible — often produces more varied outputs from the same seed)

📈 Version History

v1.0 — Initial Release

Z-Anime Base in BF16 & FP8
Z-Anime Distill-8-Step in BF16 & FP8
Z-Anime Distill-4-Step in BF16 & FP8
GGUF Variants added:
- Z-Anime-Base-Q8_0 = pruned FP8 model (6.73 GB)
- Z-Anime-Base-Q4_K_S = pruned Q4_K_S / NF4-style model (4.2 GB)
AIO Variants added (all 6):
- z-anime-base-aio-bf16 / -fp8
- z-anime-distill-8step-aio-bf16 / -fp8
- z-anime-distill-4step-aio-bf16 / -fp8
Official ComfyUI Workflow included — supports all variants
Hugging Face mirror with full Diffusers folder for Python users
Optimized for euler_ancestral + beta, simple practical use across the family

🙏 Credits

Base Architecture: Tongyi Lab (Alibaba) — Z-Image
Fine-Tune: SeeSee21
License: Apache 2.0
Architecture: S3-DiT (Single-Stream Diffusion Transformer, 6B parameters)
Base Model: Tongyi-MAI/Z-Image
GitHub: Tongyi-MAI/Z-Image
Engineer V4 Text Encoder (HF only): BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4

Z-Anime — Anime at its finest, powered by Z-Image Base. 🎌

Description

🧠 Text Encoder

Z-Anime is built for the standard Z-Image Base text encoder pipeline in ComfyUI.

Please use the matching text encoder provided for Z-Anime / Z-Image Base, together with its correct tokenizer.

Using the wrong text encoder or tokenizer can lead to:

black images

pure noise

broken prompt understanding

errors during generation

To make setup easier, I am uploading matching Text Encoder files in BF16 and FP8, so you do not have to search for the correct one yourself.

FAQ

Comments (7)

solmp1Apr 6, 2026· 2 reactions

CivitAI

Hi!

I wanted to ask if it would be possible to provide the base version of the model (full steps) in a format other than BF16. My 8GB GPU doesn’t support BF16 or FP8, which leads to higher memory usage and offloading. I’ve tried both the BF16 12GB and FP8 6GB versions.

Would it be possible to release the model in NF4 or FP16, GGUF format, so that it can fit into 8GB VRAM?

From what I understand, since my GPU doesn’t support BF16 or FP8, it tries to emulate them, which ends up converting the model to FP32 when running. This results in a huge increase in memory usage, even at small image resolutions. I’ve tried low-RAM optimizations and forcing FP16 conversion, but neither helped.

I could try converting it myself, but I understand that there are a lot of changes that make this tricky, such as:

Mixed weight types: some tensors are type=1 (float), others type=14

Some GGUF keys differ from the original safetensors → missing expected references in UNet/Context refiner

Custom layers (noise_refiner, context_refiner) → not directly supported by the GGUF loader

Thanks a lot for your time and for making this model!

SeeSeeLP

Author

Apr 7, 2026· 1 reaction

first of all thank you for your post,

yes, I wanted to upload GGUF variants,

I also have a script for it that I once wrote for QWEN. I think I would just have to adapt the internal name so that the model is recognized for the gguf nodes.

So there will be versions coming, sooner rather than later.

p.s. The fact that the model runs in fp32 format when BF16 is not available is normal, is specified by z-image and cannot be set to FP16, as far as I know

solmp1Apr 7, 2026· 2 reactions

@SeeSeeLP Thank you very much, I'll be waiting for versions for even larger potato PCs)

SeeSeeLP

Author

Apr 11, 2026· 1 reaction

@solmp1 https://civitai.com/models/2483351/z-anime?modelVersionId=2848912

solmp1Apr 12, 2026

@SeeSeeLP wow, Thank you so much for your responsiveness! I'll download the model and be sure to share the best work !

solmp1Apr 14, 2026· 1 reaction

@SeeSeeLP I’m sorry, but the issue still persists.

It seems that the NF4 version of the model still contains BF16 (and possibly FP32) tensors. Here are the tensor details for the z-anime base NF4 model:

--- Metadata --- general.file_type: [14]
general.type: [100 105 102 102 117 115 105 111 110]
--- Tensors ---
context_refiner.0.attention.qkv.weight: BF16
context_refiner.1.attention.qkv.weight: BF16
layers.0.attention.qkv.weight: Q4_K layers.1.attention.qkv.weight: Q4_K layers.2.attention.qkv.weight: Q4_K layers.3.attention.qkv.weight: Q4_K layers.4.attention.qkv.weight: Q4_K layers.5.attention.qkv.weight: Q4_K layers.6.attention.qkv.weight: Q4_K layers.7.attention.qkv.weight: Q4_K

Because of this, on my 8GB GPU the z-anime base NF4.gguf model still runs about 5x slower than flux 1 dev.

For comparison, here are the tensor details of the NF4-quantized flux 1 dev model, which runs at good speed on my setup:

--- Metadata --- --- Tensors ---
double_blocks.0.img_attn.norm.key_norm.scale: F16 double_blocks.0.img_attn.norm.query_norm.scale: F16 double_blocks.0.img_attn.proj.bias: F16
double_blocks.0.img_attn.proj.weight: Q4_0
double_blocks.0.img_attn.qkv.bias: F16
double_blocks.0.img_attn.qkv.weight: Q4_0
double_blocks.0.img_mlp.0.bias: F16
double_blocks.0.img_mlp.0.weight: Q4_0
double_blocks.0.img_mlp.2.bias: F16
double_blocks.0.img_mlp.2.weight: Q4_0

I’d like to thank you again for taking the time to create the NF4 version. I spent quite a while testing to understand the cause of the slowdown, and eventually came to the conclusion that it’s related to the BF16 and FP32 tensors based on my comparisons.

Since my GPU is based on the Turing architecture, it doesn’t support FP8 or BF16, which may be contributing to the performance loss even when the model fits into VRAM

If possible, could I kindly ask you to try implementing NF4 quantization so that the model tensors use FP16 (and ideally avoid FP32), when you have some free time?

Unfortunately, for older GPU architectures, simple NF4 quantization alone doesn’t seem to be enough to achieve good performance. While the z-anime base NF4 model fits very well into VRAM, the presence of BF16 and FP32 tensors significantly reduces iteration speed.
(P.S When I was running the comparisons, I used the FP8_e4m3fn text encoder for Flux
And for the z-anime base NF4 model, I used your FP8 3.75GB text encoder)

SeeSeeLP

Author

Apr 14, 2026· 1 reaction

@solmp1 Thanks a lot for the detailed testing and for posting the tensor info.

Yes, that is partly intentional. I did not quantize every part of the model, because I tried to stay close to the original GGUF structure to avoid stability problems or quality loss.

Of course, I can still look into whether more parts can be pushed to FP16 and whether some of the remaining higher-precision tensors can be reduced further. I just can’t say yet how much that would affect quality or generation stability. Some layers usually need to stay in higher precision, and a few parts may still require FP32.

I’ll check what else can be improved, look for a better script or method if possible, and run some tests over the next few days. I’ll keep you updated.

Checkpoint

ZImageBase

by SeeSeeLP

Download (Beta) View on CivitAI

Details

Downloads

361

Platform

CivitAI

Platform Status

Available

Created

4/5/2026

Updated

7/31/2026

Deleted

Files

zAnime_textEncoder.safetensors

Size:

7.49 GB

SHA256:

6c671498573ac2f7a5501502ccce8d2b08ea6ca2f661c458e708f36b36edfc5a

Mirrors

HuggingFace (250 mirrors)

qwen_3_4b.safetensors

qwen3-4b.safetensors

qwen.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b_bf16.safetensors

qwen_3_4b.safetensors

qwen_merged_text_encoder.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

qwen.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

Z-Image_qwen_3_4b.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

model.safetensors

qwen_3_4b.safetensors

qwen.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

zImage_textEncoder.safetensors

qwen_3_4b.safetensors

qwen.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

text_encoder-qwen_3_4b.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

Z-Image_qwen_3_4b.safetensors

qwen_3_4b.safetensors

qwen.safetensors

qwen_3_4b.safetensors

qwen_3_4b (1).safetensors

qwen_3_4b.safetensors

qwen.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

TEXT_TO_IMAGE_qwen_3_4b.safetensors

qwen_3_4b.safetensors

text_encoder.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

model-00003-of-00003.safetensors

qwen_3_4b.safetensors

zImageTurbo_turbo_txt.safetensors

qwen_3_4b.safetensors

qwen3-4b.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

zImageTurbo_turbo_txt.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

qwen_3_4b.safetensors

cyberrealisticZImage_v50_txt.safetensors

qwen_3_4b.safetensors

cyberrealisticZImage_v50_txt.safetensors

qwen_3_4b.safetensors

qwen3-4b.safetensors

qwen_3_4b.safetensors

qwen4b.safetensors

qwen_3_4b.safetensors

Text_encoders_qwen_3_4b.safetensors

qwen_3_4b.safetensors

zImageTurbo_textEncoder.safetensors

qwen_3_4b.safetensors

CivitAI (14 mirrors)

cyberrealisticZImage_v10_txt.safetensors

cyberrealisticZImage_v20NSFW_txt.safetensors

zAnime_textEncoder.safetensors

unrealvisionZITPhotoreal_universal_txt.safetensors

926Custom3JustAZIT_v10_txt.safetensors

zImageTurbo_turbo_txt.safetensors

cyberrealisticZImage_v40_txt.safetensors

cyberrealisticZImage_v50_txt.safetensors

cyberrealisticZImage_v60_txt.safetensors

cyberrealisticZImage_v70_txt.safetensors

cielbleuZIT_v2_txt.safetensors

juggernautZ_v10ByRundiffusion_txt.safetensors

qwen3_4b.safetensors

cielbleuZIT_v1_txt.safetensors

ModelScope (2 mirrors)

qwen_3_4b.safetensors

ModelScope CN (4 mirrors)

qwen_3_4b.safetensors

qwen_3_4b-bf16.safetensors

zAnime_textEncoder.safetensors

Size:

3.75 GB

SHA256:

38a245fc197f16c4025467ef46dce247d076af8f013bb8b1617013beea46d0e4

Mirrors

HuggingFace (14 mirrors)

qwen_3_4b-fp8.safetensors

CivitAI (1 mirrors)

zAnime_textEncoder.safetensors

ModelScope CN (1 mirrors)

qwen_3_4b-fp8.safetensors

🎌 Z-Anime | Full Anime Fine-Tune on Z-Image Base

✨ What is Z-Anime?

🎯 Key Features

🗺️ Z-Anime Roadmap

✅ Released

📦 Versions Overview

🟢 BF16 (~12 GB)

🟡 FP8 (~6 GB)

🔵 GGUF

🟣 AIO

🎌 Z-Anime Base

⚡ Z-Anime Distill-8-Step

🚀 Z-Anime Distill-4-Step

📐 Resolution Guide

💡 Prompting Guide

✅ Good

❌ Avoid

Character portraits

Action scenes

Backgrounds & landscapes

🔧 Installation

📦 Custom Nodes (for the official workflow)

🤗 Hugging Face Repo

📈 Version History

v1.0 — Initial Release

🙏 Credits

Description

FAQ

What is Z-Anime?

How do I use Z-Anime?

What should I watch out for with Z-Image models?

What other Z-Image-based models are worth knowing?

Can I use this model commercially?

What files are available and where can I download them?

Comments (7)

Details

Files

zAnime_textEncoder.safetensors

Mirrors

zAnime_textEncoder.safetensors

Mirrors