Mixed 8-bit microscaling quantization of Z-Image Turbo (6B S3-DiT), generated with convert_to_quant. A Blackwell GPU with comfy-kitchen installed together with CUDA 13.x and at least PyTorch 2.10 is needed for this to work on ComfyUI.
Faster inference than BF16 or FP8 on supported hardware. The quality loss is barely noticeable compared to BF16.
Description
FAQ
Comments (4)
What's the difference between BF16 (as Fp8 E4M3FN) and your last FP8?
Is the first one all tensors converted to the E4M3FN datatype? But what about the last?
The MXFP8 does look a tiny bit nicer, but it would be interesting to know the details of what exactly this is for converting other checkpoints. Thanks bro (If you could link the first and third I have a script to check what datatype the various tensors are).
You can check the technical details here: https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8. It's just a quantization based on my experience and analysis doing other more agressive quantizations such as NVFP4. Research papers about DiT advise to protect some blocks/layers. Also, I made an analysis tool to decide which layers needed protection and which others could sustain a more agressive approach when quantizing. Some layers are kept in BF16 for this reason.
@InsecureErasure Holy cow, so you probably know this stuff better than anyone does. Did you think about creating a tool to convert Zit checkpoints automatically? That would be pretty cool. I don't think most people really use the base model if we're on CivitAi.
@ferrrett33 , that tool already exists and is convert_to_quant from silveroxides AKA S1LV3RC01N here on CivitAI. I'd say he's a respected member in this community. I just happened to vibe-code a tool which I called quant_probe to help me choose which layers needed more protection when using their tool.
And yes, it's possible to use convert_to_quant to quantize other Z-Image Turbo checkpoints. You can try with the same settings I linked in my repo on HuggingFace and use the --simple flag to get a taste of what's possible with a quick quantization. Not using this parameter results in much longer quantization processes. It depends on the hardware you have and how much time you want to devote to the quantization process itself. Bear in mind that these MXFP8 take advantage only from Blackwell architecture GPUs.



















