NOTE: Ignore the model format listed! This is not an NF4 ONNX model, it is a Q5_K_M GGUF model.
This is a GGUF of flux_dev quantized in Q5_K_M GGUF format that should provide a significant quality boost over 4-bit quantizations while being a lot smaller than the 8-bit version (and since it's a relatively small GGUF, load times should be significantly improved over FP8 as well). This model is ideal of mid-sized graphics cards, and in my tests (without any memory optimizations such as offloading t5 onto the CPU) fits comfortably in 16GB of VRAM, and may work on as low as 8GB (if you have under 16GB of VRAM, please test it and leave a comment about whether it works for you).
UPDATE: Per this comment, this quant will work on systems with 8G of VRAM (Thanks to @VolatileSupernova for testing and responding!)
Tested and working in ComfyUI on my RTX 3050 with 8GB VRAM using ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF for CLIP-L and t5-v1_1-xxl-encoder-Q4_K_M for T5. I usually use the Q4_K-S model which gives me images in 6.4 seconds per iteration at 896x1152 resolution, this model with the same settings and only the model changed gives me them in 7.5 seconds, not a big change at all! It does mean that unfortunately I can't use any Loras with your K_M model since it just barely fits in my VRAM but I'd rather have the higher quality than use Loras!
EDIT: I can actually use the less than 20MB Loras without issue!
Apart from being quantized, this is an unmodified version of Flux Dev that has not been finetuned in any way. It should get along just fine with any LoRAs that will work with the full size or FP8 versions of the model.
Description
FAQ
Comments (16)
Thank you! I've been waiting for a reliable source to provide a Q5_K_M quant. City's Q5_K_S was lacking something, and a few of the other Q5_K_M's i've seen on civit made me question the legitimacy of the suppliers. More people should be thanking you for this.
Edit: This appears to be a CivitAI limitation.
This appears to download as a .zip file, which is a bit alarming. Is there a way to download the flat .gguf?
No, it doesn't let me upload a gguf on its own. Also, a zip is just a compressed archive. Unzipping it isn't going to execute any code on your machine, and inside it, you'll find just the gguf.
That being said, hopefully they'll update civit to allow direct gguf uploads soon. Once they do, I'll replace it, so if you don't want to download the zip file, just check this page regularly.
@_Envy_ Ah ok, I wasn't aware it was a CivitAI limitation. Thanks for the TIL.
@Greysion It's not problem. Better to be safe than sorry.
At any rate, given the popularity of the format, I wouldn't be surprised if they were working on supporting it directly.
Hello Envy, I hate to be a PITA. I've been trying to puzzle out what I'm doing wrong. I keep getting "RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x64 and 256x768)" I'm using webui Forge and have tried various combinations of encoders and vae which give the same error.
I ran into almost that same error yesterday when I was using the pro controlnet on ComfyUI. My fix was to make sure I had a controlnet mode selected. Is it possible that's your issue? I've never used Forge.
just got home. i will poke around forge and see what i can find. thank you!
@skibidiskoobidi Note: Today I got a similar error because I was using the wrong CLIP version. So the general answer is that it may have something to do with a mismatched model somewhere.
@_Envy_ ok i will download them again. i should use the official clips? correct? thank you
Same thing on forge
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4032x64 and 256x768)
Tested and working in ComfyUI on my RTX 3050 with 8GB VRAM using ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF for CLIP-L and t5-v1_1-xxl-encoder-Q4_K_M for T5. I usually use the Q4_K-S model which gives me images in 6.4 seconds per iteration at 896x1152 resolution, this model with the same settings and only the model changed gives me them in 7.5 seconds, not a big change at all! It does mean that unfortunately I can't use any Loras with your K_M model since it just barely fits in my VRAM but I'd rather have the higher quality than use Loras!
EDIT: I can actually use the less than 20MB Loras without issue!
how do you use loras with flux in comfy ui ? i have been trying but they don't do anything for me. The same loras work on forgeui. I would like to only use comfyui for flux, can you help ? Why did you say you cannot use loras ?
@ReyArtAge I use RGThree's custom node Power Lora loader and they work as long as you have enough VRAM to fit the Lora in with the model, larger Loras just don't work at all, at least for me, because the model only leaves a little extra VRAM open to fit Loras in.
How do you use loras with gguf versions of flux. I tried different lora loaders and they do nothing while adding the trigger in the clip l text prompt. The same loras in forgeui work well like intended.
Why there are 2 files to download with a little different size? I am little confused considering comment "NOTE: Ignore the model format listed! This is not an NF4 ONNX model, it is a Q5_K_M GGUF model" = both files are for flux but with 2 different sizes? Which one I should download for Q5_K_M for Flux?
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.















