Z-Image Turbo Lightning
Trained using FP32 Caption Features and FP32 Latents
30 Layer Training (Full Transformer)
AdamW Training (Not Fused or 8bit)
Trained on the full FP32 diffusers Z-Image Turbo (Lora for COMFY UI)
Tested (Ideal) Normal/Euler 4-8 Steps at Lora Model Strength 0.5-1.0
This is the culmination of a months work and several days training. Tested across many prompts and text combos.
In some cases Turbo will effect text accuracy.
Rarely (<1%) the LORA with drastically alter the image seed to seed
Description
FAQ
Comments (15)
Incredible work!
Thank You
Sorry but Im confused what this does. Isnt turbo already turbo? and you're turbo'ing it further?
Yes, and reducing the blurring at 4steps, and has shown improved detail at 8steps although reduction of lora weight may be needed at 8steps
@wyxzddsjj919 Visual quality of images is subjective to a point, I have not done a VGS or RINO calculation on Turbo 4 Step + Lora vs 9 Step Turbo no Lora.
Regarding the worse quality images, yes to a point you can not get something from nothing however the model is already using a very deterministic method of image generation, that is closer to a retrieval then a generation. Meaning if the LLM hidden input stays the same, the difference per 100, seeds or even a thousands seeds is minimal.
Works like a charm, thank very much) And euler/normal is only combination you have tested?
Simple/Normal Euler was what I ran a few hundred image prompt combos on - I've been told it works with others
Could you make a version for Flux Klein 4b?
I doubt it, but I might look at it. So with Z-Image they trained/distilled Turbo in FP32 and those weights enabled TF32 training. Klein is 4 step already, and does not have any higher quality weights to my knowledge. Training sub 4 would likely not be possible even with FP32 available.
Thank you for sharing. I was able to generate the image in just four steps (the original ZIT model requires at least five steps using a high-speed scheduler like UIPC, which is prone to incomplete images). I noticed that using this LORA model with eight steps without reducing the weights seems to have a "about to burn out" feel, like the result of choosing the wrong sampler scheduler. Also, according to you, the four-step method, which converges faster to reduce noise and generate the image, results in the loss of large sections of natural language prompts (short AI-generated text snippets), meaning that LORA is best suited for drawing images of "single subjects," is that correct?
The model was unsupervised training on full captions features rather then the 32,2560 the model expects. This can have drastic effects on the prediction, I found it to be less the 1 in 100.
In most cases it can handle multi language and even text generation, I have not tested enough seed to seed to calculate the increase in text errors. Nor can I read Chinese to know if it is handling those prompts correctly.
This LoRA is soooo good. Lightweight but powerful!
Thank you





