CivArchive
    Flux Dev Q5_K_M GGUF quantization (a nice balance of speed and quality in under 9 gigabytes) - v1.0
    NSFW
    Preview 27282464
    Preview 27282531
    Preview 27282155
    Preview 27282186
    Preview 27282192
    Preview 27282195
    Preview 27282234
    Preview 27282249
    Preview 27282262
    Preview 27282272
    Preview 27282302
    Preview 27282327
    Preview 27282380
    Preview 27282406
    Preview 27282468
    Preview 27282471

    NOTE: Ignore the model format listed! This is not an NF4 ONNX model, it is a Q5_K_M GGUF model.

    This is a GGUF of flux_dev quantized in Q5_K_M GGUF format that should provide a significant quality boost over 4-bit quantizations while being a lot smaller than the 8-bit version (and since it's a relatively small GGUF, load times should be significantly improved over FP8 as well). This model is ideal of mid-sized graphics cards, and in my tests (without any memory optimizations such as offloading t5 onto the CPU) fits comfortably in 16GB of VRAM, and may work on as low as 8GB (if you have under 16GB of VRAM, please test it and leave a comment about whether it works for you).

    UPDATE: Per this comment, this quant will work on systems with 8G of VRAM (Thanks to @VolatileSupernova for testing and responding!)

    Tested and working in ComfyUI on my RTX 3050 with 8GB VRAM using ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF for CLIP-L and t5-v1_1-xxl-encoder-Q4_K_M for T5. I usually use the Q4_K-S model which gives me images in 6.4 seconds per iteration at 896x1152 resolution, this model with the same settings and only the model changed gives me them in 7.5 seconds, not a big change at all! It does mean that unfortunately I can't use any Loras with your K_M model since it just barely fits in my VRAM but I'd rather have the higher quality than use Loras!

    EDIT: I can actually use the less than 20MB Loras without issue!

    Apart from being quantized, this is an unmodified version of Flux Dev that has not been finetuned in any way. It should get along just fine with any LoRAs that will work with the full size or FP8 versions of the model.

    Description

    FAQ

    Comments (16)

    cosmogonautSep 3, 2024
    CivitAI

    Thank you! I've been waiting for a reliable source to provide a Q5_K_M quant. City's Q5_K_S was lacking something, and a few of the other Q5_K_M's i've seen on civit made me question the legitimacy of the suppliers. More people should be thanking you for this.

    GreysionSep 3, 2024
    CivitAI

    Edit: This appears to be a CivitAI limitation.

    This appears to download as a .zip file, which is a bit alarming. Is there a way to download the flat .gguf?

    _Envy_
    Author
    Sep 3, 2024· 2 reactions

    No, it doesn't let me upload a gguf on its own. Also, a zip is just a compressed archive. Unzipping it isn't going to execute any code on your machine, and inside it, you'll find just the gguf.

    That being said, hopefully they'll update civit to allow direct gguf uploads soon. Once they do, I'll replace it, so if you don't want to download the zip file, just check this page regularly.

    GreysionSep 5, 2024

    @_Envy_ Ah ok, I wasn't aware it was a CivitAI limitation. Thanks for the TIL.

    _Envy_
    Author
    Sep 5, 2024

    @Greysion It's not problem. Better to be safe than sorry.

    At any rate, given the popularity of the format, I wouldn't be surprised if they were working on supporting it directly.

    skibidiskoobidiSep 3, 2024· 1 reaction
    CivitAI

    Hello Envy, I hate to be a PITA. I've been trying to puzzle out what I'm doing wrong. I keep getting "RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x64 and 256x768)" I'm using webui Forge and have tried various combinations of encoders and vae which give the same error.

    _Envy_
    Author
    Sep 3, 2024· 1 reaction

    I ran into almost that same error yesterday when I was using the pro controlnet on ComfyUI. My fix was to make sure I had a controlnet mode selected. Is it possible that's your issue? I've never used Forge.

    skibidiskoobidiSep 4, 2024

    just got home. i will poke around forge and see what i can find. thank you!

    _Envy_
    Author
    Sep 4, 2024

    @skibidiskoobidi Note: Today I got a similar error because I was using the wrong CLIP version. So the general answer is that it may have something to do with a mismatched model somewhere.

    skibidiskoobidiSep 4, 2024

    @_Envy_ ok i will download them again. i should use the official clips? correct? thank you

    kiryanton930Nov 14, 2024

    Same thing on forge

    RuntimeError: mat1 and mat2 shapes cannot be multiplied (4032x64 and 256x768)

    VolatileSupernovaSep 4, 2024
    CivitAI

    Tested and working in ComfyUI on my RTX 3050 with 8GB VRAM using ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF for CLIP-L and t5-v1_1-xxl-encoder-Q4_K_M for T5. I usually use the Q4_K-S model which gives me images in 6.4 seconds per iteration at 896x1152 resolution, this model with the same settings and only the model changed gives me them in 7.5 seconds, not a big change at all! It does mean that unfortunately I can't use any Loras with your K_M model since it just barely fits in my VRAM but I'd rather have the higher quality than use Loras!

    EDIT: I can actually use the less than 20MB Loras without issue!

    ReyArtAgeSep 9, 2024

    how do you use loras with flux in comfy ui ? i have been trying but they don't do anything for me. The same loras work on forgeui. I would like to only use comfyui for flux, can you help ? Why did you say you cannot use loras ?

    @ReyArtAge I use RGThree's custom node Power Lora loader and they work as long as you have enough VRAM to fit the Lora in with the model, larger Loras just don't work at all, at least for me, because the model only leaves a little extra VRAM open to fit Loras in.

    ReyArtAgeSep 9, 2024
    CivitAI

    How do you use loras with gguf versions of flux. I tried different lora loaders and they do nothing while adding the trigger in the clip l text prompt. The same loras in forgeui work well like intended.

    Geralt28Nov 3, 2024· 1 reaction
    CivitAI

    Why there are 2 files to download with a little different size? I am little confused considering comment "NOTE: Ignore the model format listed! This is not an NF4 ONNX model, it is a Q5_K_M GGUF model" = both files are for flux but with 2 different sizes? Which one I should download for Q5_K_M for Flux?

    Checkpoint
    Flux.1 D

    Details

    Downloads
    3,205
    Platform
    CivitAI
    Platform Status
    Available
    Created
    9/1/2024
    Updated
    5/13/2026
    Deleted
    -

    Files

    fluxDevQ5KMGGUFQuantizationA_v10.zip

    Mirrors

    fluxDevQ5KMGGUFQuantizationA_v10.gguf

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.