CivArchive
    Flux.1-Dev GGUF Q2.K Q3.KS Q4/Q4.1/Q4.KS Q5/Q5.1/Q5.KS Q6.K Q8 - Q2.K
    Preview undefined

    Source https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main by city96

    This is a direct GGUF conversion of Flux.1-dev. As this is a quantized model not a finetune, all the same restrictions/original license terms still apply. Basic overview of quantization types.

    The model files can be used with the ComfyUI-GGUF custom node.
    Place model files in ComfyUI/models/unet - see the GitHub readme for further install instructions.
    Also working with Forge since the latest commit!

    ☕ Buy me a coffee: https://ko-fi.com/ralfingerai
    🍺 Join my discord: https://discord.com/invite/pAz4Bt3rqb

    Description

    FAQ

    Comments (90)

    RoscosmosAug 19, 2024· 4 reactions
    CivitAI

    lol

    mirek190Aug 19, 2024· 4 reactions
    CivitAI

    Where is Q4K_m?

    That version is better than old Q5.

    User37Aug 19, 2024· 6 reactions
    CivitAI

    lmfao, you've earned a follow for your lowering the size of Flux. Gotta love this, lmao.

    yepeishengAug 19, 2024· 8 reactions
    CivitAI

    Surprising, Q4_K_S, 3.5s/it on 2070S_8G+16G RAM

    low_channel_1503Aug 20, 2024

    can you share workflow? or forge settings?

    yepeishengAug 20, 2024

    @low_channel_1503 Some example diagrams have been provided

    DonMischoAug 20, 2024· 5 reactions
    CivitAI

    This on is pretty nice and even works with LoRAs. :)

    DarknoiceAug 20, 2024

    May I ask which version you are using since I couldn't get Lora to work with the Q4

    DonMischoAug 20, 2024· 2 reactions

    @Darknoice Well latest, like updated yesterday :) You could check out my new model, the workflow is in the images.

    Billy_Ray_ValentineAug 20, 2024· 1 reaction
    CivitAI

    What?

    theunkoneAug 20, 2024· 8 reactions
    CivitAI

    AssertionError: You do not have CLIP state dict!
    getting this error while using it in forge ui

    EggbenaAug 21, 2024· 1 reaction

    put the clip files in the text_encoder folder within models, then load those as vae

    JayNLAug 21, 2024
    CivitAI

    Hmm, according to the Mean Δp on the LLaMA 3 8b Scoreboard, the Q6_K could be better than Q8, I'll give it a try today, love testing all these shits 😂

    elyzionz1Aug 21, 2024
    CivitAI

    I tried all these models and the fastest for me is still the nf4 v2

    JayNLAug 21, 2024

    But NF4 will be deprecated because of the better quality of GGUF
    https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4

    rtveate011105194Aug 21, 2024· 6 reactions
    CivitAI

    Currently, I found to maintain high fidelity and sharpness at lower quantization one must increase steps 2-5, for a total of 22-25 steps min; this brings it closer to fidelity and sharpness of higher quantized models & FP8 far more often. Also, lower quantized models look much better with different samplers and schedulers.

    For Q6 and up: DEIS/BETA or uni_pc_h2 + sgm_uniform.

    For Q5 I found uni_pc_h2 + sgm_uniform look exceptional, DEIS/BETA was nice but was more prone to blur occassionally

    For ~Q4, uni_pc_h2 + sgm_uniform

    On Q3, uni_pc_h2 + sgm_uniform, and or euler/simple, the former being superior.

    On Q2, euler/simple offered the highest fidelity; whereas uni_pc_h2 + sgm_uniform looked bad, and deis/beta and look horrible. You probably want to increase the steps to 30 minimum for Q2. I didn't test all of them so, maybe there are better options. The more blurry, the more denoising needed, the more steps needed. In this test I was rendering at 1920x1088.

    MaqueAIAug 21, 2024· 2 reactions
    CivitAI

    The image generated by the diagram is black

    handyman1986Aug 21, 2024

    Same, trying to generate with Q4_1

    MaqueAIAug 22, 2024

    @cosmoslayer26 Q6 version? Some say that Q6 is very close to Q8

    JayNLAug 21, 2024· 8 reactions
    CivitAI

    After some testing I can say Q6_K comes extremely close to Q8, but is about 3GB smaller! 💕

    wzwowzwAug 22, 2024· 3 reactions
    CivitAI

    everyone says Q6... ok download

    lothariusAug 24, 2024· 1 reaction

    If you are interested why you could check out this plot:
    https://raw.githubusercontent.com/matt-c1/llama-3-quant-comparison/main/plots/MMLU-Correctness-vs-Model-Size.png

    It shows the quality of responses for the different quants with the 8B and 70B Llama3.1 models.
    Only the blue dots are really relevant for now, and they show that q6 still barely dips below the line that shows fp16 quality like q8. Its essentially the same quality as q8/fp16 but with a huge decrease in size.

    Interesting is that for the 70B even q2 quants give better responses than even fp16 quants for the small 8B model. It would be really interesting if huge image generation models scale the same with an increase in parameters. Although they would be unbearably slow though and huge in size (Llama3.1 8B at q6 is 6.6GB and Llama3.1 70B at q2 is 26.4GB)

    gurusarrasSep 5, 2024

    @lotharius Actually going by that chart the golden point is Q5KS it's smaller in size but in the same height/correctness so Im going to download Q5KS.I don't know about the accuracy of that chart for comparing Flux but this Q5KS is here is perfect just under 8 gbs unlike Q6 which is 9.1 Gb so although you can use with 8gb vram Gpu(In Forge at least) ıt would need to use system ram which makes things a lot slower.1.5 gb difference between the Q5KS and Q6 doesn't seem much until you realize that puts it perfectly under 8gb vram.And also I like the number 5 more than 6 lol

    tuefmaAug 22, 2024· 3 reactions
    CivitAI

    For me, LoRAs work with Q4 K S, Q5 K S, but not with Q8 0. Haven't tried the rest of the models.

    MaxiimusSep 25, 2024

    Doesn't work with Q6.K

    yuhuibear131Aug 22, 2024· 2 reactions
    CivitAI

    You are really a hero! If gguf for controllnet is on the way, you will be the God.
    By the Q4 is better enough , latent composite is working. Very strong on style compose.

    anduxAug 22, 2024
    CivitAI

    anyone found what divverent versions (k, s, etc) are about?

    fronyaxAug 23, 2024· 2 reactions

    @JayNL what are those number tells??

    The smaller the Qnumber the more degradation in quality??

    anduxAug 23, 2024

    @fronyax short version, yes

    tkpAug 23, 2024
    CivitAI

    Does the Q6.K model work with loras? It's just stopping dead on generating when I use them together, but I can't tell if I've just missed something.

    tkpAug 23, 2024· 1 reaction

    Ah, okay. I moved the model to the diffusion_models folder and that's solved it for me.

    william_leungAug 25, 2024

    "Diffusion in Low Bits" is set to "Automatic (fp16 LoRA)"

    DearLuckSep 8, 2024

    It definitely does work with loras, I am using it to fit more loras in VRAM

    ALFARANKOAug 23, 2024· 4 reactions
    CivitAI

    its taking much more time than nf4 !!

    eurotakuAug 24, 2024· 1 reaction

    it's a compression technology, so it trades longer inference times for a size reduction

    JayNLAug 25, 2024· 1 reaction

    I guess that depends on your generation videocard, for the 40-series it's the same or even faster.

    luchetesAug 29, 2024· 4 reactions

    with a 3060 12gb, 136s on gguff, vs 99.95s on NF4

    CruzFleshAug 31, 2024· 3 reactions

    slower much compatible with loras and better prompt understanding

    JayNLAug 26, 2024
    CivitAI

    It keeps skipping back to Q2.K, I wonder how much posts in Q2.K are accidentally posted there, quality seems too high, but I do see some with a lot(!) of steps.

    DearLuckSep 15, 2024· 1 reaction

    You are right, people post Q8 results in Q2

    925_StudioSep 1, 2024· 5 reactions
    CivitAI

    1.39s/it (Q5S, Model+Clip)

    1.37s/it (Q6K, Model+Clip)

    1.06s/it (Q8, Model+Clip)

    1.10it/s (Original, Model+Clip, fp16)

    I was tested on my 4080S with ComfyUI. Does this mean the GGUF format only optimized the size, not the speed?

    JayNLSep 1, 2024· 2 reactions

    Yeah it's only about fitting in your VRAM, not about speed, I can run a batch of 2 in Q8 on a 4070, but a batch of 4 in Q4 (maybe more, didn't test).

    925_StudioSep 1, 2024· 1 reaction

    @JayNL Thank you! That is very clear.

    443152Sep 2, 2024
    CivitAI

    anyone knows which is better? Q6K or FP8?

    941052Sep 3, 2024· 1 reaction

    it depends of what you mean with better. Everyone has different needs. If you want the most fastest and memory efficient version the NF4 version is the way to go.If you want the best output you need to pick the original fp16 weights. FP8 is in the half:is faster than compressed gguf but not in any scenarios it depends if you have 8gb or 12gb vram. Q8_0 should have a better output...but i think Q6_0 should be a better choiche because of this problem in early quantization: https://github.com/city96/ComfyUI-GGUF/issues/79

    443152Sep 3, 2024

    @sambaspo thanks for your anwer!

    DearLuckSep 6, 2024

    @Elysia_Saikou Having done quite a few attempts I would prefer Q6 over the default fp8, but this is mostly aesthetics.

    443152Sep 7, 2024

    @DearLuck after experimenting for a few days, I found that q6k suits me better as well. However, I'm currently using q8_0 to aim for higher accuracy. That said, FP8 seems to be the fastest among them.

    SuzanneSep 5, 2024· 1 reaction
    CivitAI

    For me : 3070 RTX, 8G VRAM, 32 RAM, image = 896 X 1152 px

    100%|██████████| 20/20 [01:13<00:00, 3.67s/it]

    with : flux1-dev-Q8_0.gguf

    is it good performance ? I don't know... 🤔

    infernahermit846Sep 5, 2024· 1 reaction

    speeds are the same, doesn't matter if you use Q2 or Q8, both generates around a bit over 1minute.

    fhaifhaiSep 6, 2024· 1 reaction

    try using a version that fits your VRAM, Q8 has 12 GB but your GPU only has 8 GB, maybe use a lower number

    SuzanneSep 6, 2024· 1 reaction

    @fhaifhai Q8 is great with 8Go Vram = 1min

    fhaifhaiSep 6, 2024· 1 reaction

    @Suzanne yeah, I actually get the same speed with RTX 4060 | 16GB and Q8:
    100%|██████████████████| 31/31 [01:47<00:00, 3.47s/it]

    706784Sep 24, 2024

    Yes, same speeds with RTX 4060, 32 GB RAM and Q8, it hovers around 1:10 to 1:45 depending whether I use a LORA or not.

    runebloodstoneOct 18, 2024

    Yes that's fine.

    Sanya_ArchitectNov 8, 2024

    Same

    Naks92Jan 28, 2025

    @Suzanne DDR5 RAM?

    Naks92Jan 28, 2025

    I've got a 3070 8GB too but only getting 120s/it

    kapec512Sep 9, 2024
    CivitAI

    Is "FLUX GGUF Q8" the best and most accurate Flux model these days or not?

    RalFinger
    Author
    Sep 9, 2024· 2 reactions

    I would say so, if you can run it, it is the best choice

    LokiPoki1Sep 13, 2024
    CivitAI

    Does this work with Forge, or just ComfyUI?

    RalFinger
    Author
    Sep 13, 2024· 3 reactions

    Works with Forge :)

    DlzezeSep 21, 2024

    @RalFinger Just placing on models folder? Even with a different extension?

    Haircut66Oct 26, 2024· 2 reactions

    @Dlzeze all models go in the same model folder, including GGUF

    papavicksSep 16, 2024
    CivitAI

    does this need a vae, clip?

    MaxiimusSep 25, 2024· 2 reactions
    CivitAI

    Really good! It'd be nice if this had LORA support tho, but still awesome regardless

    adengroup2688Sep 28, 2024
    CivitAI

    I get this error on all models. CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    RalFinger
    Author
    Sep 28, 2024· 1 reaction

    Found this topic on the git repo, hope that helps!

    adengroup2688Sep 28, 2024

    The problem was solved by reducing the VRAM frequency

    adengroup2688Sep 28, 2024· 1 reaction

    @RalFinger The problem was solved by reducing the VRAM frequency

    adengroup2688Sep 28, 2024· 2 reactions

    @RalFinger was able to create an image, the results were excellent. thank you

    RalFinger
    Author
    Sep 28, 2024

    @adengroup2688 glad to hear that! :)

    kutoOct 5, 2024· 17 reactions
    CivitAI

    To get the loras to work for Q6k and Q8, you need to have the t5xxl_fp16.safetensors vae. and to set your Diffusion in Low Bits option to - Automatic (fp16 LORA)

    HooChooOct 6, 2024· 1 reaction

    I don't see where to set Diffusion in Low Bits? Where is this located

    kutoOct 7, 2024

    @HooChoo if your using forge ui or automatic 1111, it should be at the top

    blhllNov 16, 2024· 1 reaction

    Needed for some LORA's on Q5k as well, luckily they print it in console.

    zz3870471May 6, 2025

    god

    uprz23736Oct 10, 2024
    CivitAI

    Is there a qint8 version? Not sure how to use this in comfy https://huggingface.co/Disty0/FLUX.1-dev-qint8

    Sammy66Oct 25, 2024· 2 reactions
    CivitAI

    I love the images Q2.K gives, the so called higher quality images just don't give the more raw poor/messy/bad image style I like

    bydm44Oct 27, 2024· 3 reactions
    CivitAI

    i tried Q4.KS and Q5 and bot of them are slow then fp8. dont understand this. gpu 3060 12gb

    konservatorNov 4, 2024

    did you solve the problem?

    bydm44Nov 4, 2024· 3 reactions

    i changed to this model flux1DevHyperNF4Flux1DevBNB. but i dont know which one. :) its okay and faster. its 8 step but with 12 step getting better result

    alxenNov 3, 2024· 2 reactions
    CivitAI

    What about Q4_0 and Q4_1? any one know this?

    blissiraMar 4, 2025

    Q4_0 is tailored for limited VRAM for 4GB–7GB GPU

    CannotBeRealFeb 15, 2025· 4 reactions
    CivitAI

    Q2 is an absolutely useless, messy model... you better download a pic from the web and then compress it asf, and it will look even better than a Q2 output

    rafaelldestiloFeb 18, 2025· 2 reactions
    CivitAI

    It's running on an RTX 3060ti 8gb vram 32gb ram, and the speed is good, little more than the FN4, I never imagined it could run flux models with 8GB vram, better results than SDXL without a doubt

    kintilian45May 9, 2025

    I run flux1-dev on my RTX3060 6GB (no typo) without [serious] issues. Granted, I don't do any upscaling or Hires Fix with it. 😅

    Checkpoint
    Flux.1 D

    Details

    Downloads
    6,548
    Platform
    CivitAI
    Platform Status
    Available
    Created
    8/19/2024
    Updated
    5/12/2026
    Deleted
    -

    Files

    flux1DevGGUFQ2KQ3KSQ4Q41Q4KS_q2K.zip

    Mirrors

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.