[Note: Unzip the download to get the GGUF. Civit doesn't support it natively, hence this workaround]
Flux1.D merged in Flux1.S. It can generate good-quality images (better than Schnell) with just 4 steps, and the quality further improves with more steps, while consuming a very low amount of VRAM. Q_4_0 can produce 1024x1024 images in 45 seconds on my 11GB 1080ti, while using around 6.5 Gigs of VRAM.
It can be used in ComfyUI with this custom node or with Forge UI. See https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050 to learn more about Forge UI GGUF support and also where to download the VAE, clip_l and t5xxl models.
Which model should I download?
[Current situation: Using the updated Forge UI and Comfy UI (GGUF node) I can run Q8_0 on my 11GB 1080ti.]
Download the one that fits in your VRAM. The additional inference cost is quite small if the model fits in the GPU. Size order is Q4_0 < Q4_1 < Q5_0 < Q5_1 < Q8_0.
Q4_0 and Q4_1 should fit in 8 GB VRAM
Q5_0 and Q5_1 should fit in 11 GB VRAM
Q8_0 if you have more!
Note: With CPU offloading, you will be able to run a model even if doesn't fit in your VRAM.
LoRA usage tips
The model seems to work pretty well with LoRAs (tested in Comfy). But you might need to increase the number of steps a little (8-10).
Updates
V2: I created the original (v1) from an fp8 checkpoint. Due to double quantization, it accumulated more errors. So I found that v1 couldn't produce sharp images. For v2 I manually merged the bf16 Dev and Schnell checkpoints and then made the GGUF. This version can produce more details and much crisper results.
All the license terms associated with Flux.1 Dev and Flux.1 Schnell apply.
PS: Credit goes to jice and comfy.org for the merge recipe. I used a slightly modified version of https://github.com/city96/ComfyUI-GGUF/blob/main/tools/convert.py to create this.
Description
Almost identical to full bf16. But also heavy on VRAM.
FAQ
Comments (66)
V2 is faster on my computer than V1. Thank you so much for making this!
this has been my experience as well, for some reason there is less down time between loading the model and the clips/vae.
it give me black image
this works very well for me however if i add lora it doesn't seem to do anything
You might need to increase the number of steps a bit
Did the LoRA's work with other GGUF models? If not update Unet Loader
@nakif0968 increasing the strength of the lora to > 2 seem to work, but not all loras have the desired effect, but the ones that do work incredibly well.
@xhorxhi Yes, try updating the Comfy + GGUF node too like @JayNL mentioned
civitai shows the Q8 model being 11.82 GB but downloaded it's 12.7 GB. wonder why, and I wonder if I can use it with my 12GB 3060. edit: it works. detail isn't great at 4 steps, need to look into this more.
If you're not satisfied with the detail, try cranking the steps a bit. It should improve up to 12 steps.
Another tip, if you refine(Comfy)/img2img(Forge) using the Dev model with a denoise of ~0.4 and 20x0.4=8 steps you can gain back a lot of detail and expressivity. Use this model for quick iteration, and use Dev model to finalize the good ones.
black image all time with q_41_v2 any solution ?
Which UI are you using?
@nakif0968 comfyui i'm using the cpu , which detail i can provide to more to let know why this problem ?
@amazingbeauty Seems like a Comfy problem. Raise an issue at the Comfy UI Github repo for CPU inference with GGUF.
@nakif0968 who will listen to cpu user between most of people with thousands of dollars or hunderds gpu..!!
i will forget about it
@amazingbeauty use a different scheduler. If you had been using Euler + Simple and got the black image, try the SGM-Uniform. If nothing changed then use the Sampler DPM++2M + Simple. If neither work, use a different text encoder/VAE combo.
@ishadowxx Where i can find a better text encoder/vae?
Make gguf Q4k_m instead of "old" Q4, Q4.1 or Q5 please.
Q4k_m has quality of "old" Q5.1
Which quant should I (we...) select for this kind of quantization in SD Forge? I'm only familiar with the regular Stable Diffusion quants (FP/BF/NF), I don't really understand how those translate to LLM quants. SD Forge also doesn't have a selection of Q quants either... so which should we select, FP16? FP8?
Just use "Automatic" setting. It will detect it by itself. And regarding which you should download, I'd recommend trying from the biggest model Q8_0, given you have >8Gigs of VRAM.
@nakif0968 I'll give it a try for sure! Currently doing a LoRA grid right now, so running 4_1, about 27 seconds per inference, for noticeably higher-quality images than finetuned SDXL... and Flux is still a baby! I can't even imagine what this model will be capable of after a few generations of finetunes and merges...
Anyways, thanks for sharing, keep up the good work!
SD Forge says "Distilled CFG Scale will be ignored for Schnell". Does this mean I should adjust the regular CFG as needed?
No, unfortunately since this is a Schnell model, you don't have CFG/guidance control. So, basically, you're stuck with the prompt sensitivity baked into the model, you cannot turn it up/down.
@nakif0968 So nothing I can do to reduce the general over-baked quality compared to dev? So far every image I've made with Schnell has resulted in what I would normally consider to be too high CFG in SDXL etc.
@MysticDaedra Unfortunately, no. It has to be done via LoRA or prompt tricks. You may also try different samplers, but no CFG for now.
@MysticDaedra You might want to try HyperFlux, which has CFG control https://civitai.com/models/705444/gguf-hyperflux-8-steps-flux1-dev-bytedance-hypersd-lora
I don't think combining shnel with dev fixes the licensing its still noncommercial
Yes
asking for Q6_K. It is the highest one that runs without swapping on my 12GB setup in linux.
You mean swapping the text encoder? because Q8 should fit in 12 Gigs.
BTW... I don't know how to make K/S quants yet, because gguf-py doesn't support that yet.
Q8 works fine on 4070 12GB, doesn't even slow down Windows, only F16 makes my PC useless to do anything else.
I'm with 3060 12G, Q4_1, 4steps, 1152*896.
Just 32 seconds for 1 photo, that's impressive for me
AssertionError: You do not have T5 state dict!
what does it mean ?
You need to download the T5, CLIP-L and VAE models CLIP and T5 : https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main , VAE: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors
If you're using forge see: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050 regarding where to put these models and how to use them.
@nakif0968 thanks for your reply, but now I have this new error message but only with this model,
AssertionError: You do not have VAE state dict!
https://civitai.com/models/618692/flux?modelVersionId=691639
Im confused about there being a Q4_0 and Q4_1 whats the difference?
Size vs quality. Bigger size -> better quality, but also more VRAM required
I'm not familiar with GGUF. Does it only integrates as a node in ComfyUI workflow? Or would it be possible to use it as a model in A1111?
What's the benefit of merging these two?
Can you compare this with Flux.1 dev gguf 8 in vram/speed/quality?
same quetion
Better prompt understanding than Schnell while practically losing nothing in speed (still 4-step).
Is there a Workflow I can find for this one?
The demo images were made in Comfy so you can just drag and drop to get the workflow
Yes, the model is fast, but still significantly worse than full Flux.dev
what gpu did you use?
@SyamsQ 12GB
@Yellowboot RTX 4070Ti?
@SyamsQ 3060
@Yellowboot How fast does your PC generate a single 1024x1024 image?
@SyamsQ quality is more important to me than speed
Is there a ComfyUI workflow that can utilise this Q8 v2 that you would recommend? my current ones can't find it?
Make sure have the extension https://github.com/city96/ComfyUI-GGUF . See the images of Q4_0 for workflow, swap out the model for Q8_0
Unet or checkpoint folder
Unets. (BTW, make sure to unzip the file first.)
after downloading 2 times it says archive is corupt
I can assure you that there are no problems with the upload, either it's something on your end or CivitAi servers suddenly acting up or something.
@pretty_pixels ok thanks for responding, ill give i download again. Great work. really
Can the images created by this neural network be sold? I really don't understand the point of all these licences.....
This is a truly amazing achievement - this merge.
It maintains much better anatomy and composition, outputs much closer to Dev than Schnell.
My sincere congratulations, and thank you very much!
Thanks you Black Forest Labs! Greats tools and I made this.... "Neon Fugue". A nostalgic trip to the hardboiled neo-noir police movies of the late 1970's with funky music and no nonsense crime busting.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.






