Flux.1-Dev GGUF Q2.K Q3.KS Q4/Q4.1/Q4.KS Q5/Q5.1/Q5.KS Q6.K Q8

Flux.1-Dev GGUF Q2.K Q3.KS Q4/Q4.1/Q4.KS Q5/Q5.1/Q5.KS Q6.K Q8 - Q4.KS

Source https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main by city96

This is a direct GGUF conversion of Flux.1-dev. As this is a quantized model not a finetune, all the same restrictions/original license terms still apply. Basic overview of quantization types.

The model files can be used with the ComfyUI-GGUF custom node.
Place model files in ComfyUI/models/unet - see the GitHub readme for further install instructions.
Also working with Forge since the latest commit!

☕ Buy me a coffee: https://ko-fi.com/ralfingerai
🍺 Join my discord: https://discord.com/invite/pAz4Bt3rqb

Description

FAQ

Comments (90)

RoscosmosAug 19, 2024· 4 reactions

CivitAI

lol

mirek190Aug 19, 2024· 4 reactions

CivitAI

Where is Q4K_m?

That version is better than old Q5.

User37Aug 19, 2024· 6 reactions

CivitAI

lmfao, you've earned a follow for your lowering the size of Flux. Gotta love this, lmao.

yepeishengAug 19, 2024· 8 reactions

CivitAI

Surprising, Q4_K_S, 3.5s/it on 2070S_8G+16G RAM

low_channel_1503Aug 20, 2024

can you share workflow? or forge settings?

yepeishengAug 20, 2024

@low_channel_1503 Some example diagrams have been provided

DonMischoAug 20, 2024· 5 reactions

CivitAI

This on is pretty nice and even works with LoRAs. :)

DarknoiceAug 20, 2024

May I ask which version you are using since I couldn't get Lora to work with the Q4

DonMischoAug 20, 2024· 2 reactions

@Darknoice Well latest, like updated yesterday :) You could check out my new model, the workflow is in the images.

Billy_Ray_ValentineAug 20, 2024· 1 reaction

CivitAI

What?

theunkoneAug 20, 2024· 8 reactions

CivitAI

AssertionError: You do not have CLIP state dict!
getting this error while using it in forge ui

EggbenaAug 21, 2024· 1 reaction

put the clip files in the text_encoder folder within models, then load those as vae

JayNLAug 21, 2024

CivitAI

Hmm, according to the Mean Δp on the LLaMA 3 8b Scoreboard, the Q6_K could be better than Q8, I'll give it a try today, love testing all these shits 😂

elyzionz1Aug 21, 2024

CivitAI

I tried all these models and the fastest for me is still the nf4 v2

JayNLAug 21, 2024

But NF4 will be deprecated because of the better quality of GGUF
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4

rtveate011105194Aug 21, 2024· 6 reactions

CivitAI

Currently, I found to maintain high fidelity and sharpness at lower quantization one must increase steps 2-5, for a total of 22-25 steps min; this brings it closer to fidelity and sharpness of higher quantized models & FP8 far more often. Also, lower quantized models look much better with different samplers and schedulers.

For Q6 and up: DEIS/BETA or uni_pc_h2 + sgm_uniform.

For Q5 I found uni_pc_h2 + sgm_uniform look exceptional, DEIS/BETA was nice but was more prone to blur occassionally

For ~Q4, uni_pc_h2 + sgm_uniform

On Q3, uni_pc_h2 + sgm_uniform, and or euler/simple, the former being superior.

On Q2, euler/simple offered the highest fidelity; whereas uni_pc_h2 + sgm_uniform looked bad, and deis/beta and look horrible. You probably want to increase the steps to 30 minimum for Q2. I didn't test all of them so, maybe there are better options. The more blurry, the more denoising needed, the more steps needed. In this test I was rendering at 1920x1088.

MaqueAIAug 21, 2024· 2 reactions

CivitAI

The image generated by the diagram is black

handyman1986Aug 21, 2024

Same, trying to generate with Q4_1

MaqueAIAug 22, 2024

@cosmoslayer26 Q6 version? Some say that Q6 is very close to Q8

JayNLAug 21, 2024· 8 reactions

CivitAI

After some testing I can say Q6_K comes extremely close to Q8, but is about 3GB smaller! 💕

wzwowzwAug 22, 2024· 3 reactions

CivitAI

everyone says Q6... ok download

lothariusAug 24, 2024· 1 reaction

If you are interested why you could check out this plot:
https://raw.githubusercontent.com/matt-c1/llama-3-quant-comparison/main/plots/MMLU-Correctness-vs-Model-Size.png

It shows the quality of responses for the different quants with the 8B and 70B Llama3.1 models.
Only the blue dots are really relevant for now, and they show that q6 still barely dips below the line that shows fp16 quality like q8. Its essentially the same quality as q8/fp16 but with a huge decrease in size.

Interesting is that for the 70B even q2 quants give better responses than even fp16 quants for the small 8B model. It would be really interesting if huge image generation models scale the same with an increase in parameters. Although they would be unbearably slow though and huge in size (Llama3.1 8B at q6 is 6.6GB and Llama3.1 70B at q2 is 26.4GB)

gurusarrasSep 5, 2024

@lotharius Actually going by that chart the golden point is Q5KS it's smaller in size but in the same height/correctness so Im going to download Q5KS.I don't know about the accuracy of that chart for comparing Flux but this Q5KS is here is perfect just under 8 gbs unlike Q6 which is 9.1 Gb so although you can use with 8gb vram Gpu(In Forge at least) ıt would need to use system ram which makes things a lot slower.1.5 gb difference between the Q5KS and Q6 doesn't seem much until you realize that puts it perfectly under 8gb vram.And also I like the number 5 more than 6 lol

tuefmaAug 22, 2024· 3 reactions

CivitAI

For me, LoRAs work with Q4 K S, Q5 K S, but not with Q8 0. Haven't tried the rest of the models.

MaxiimusSep 25, 2024

Doesn't work with Q6.K

yuhuibear131Aug 22, 2024· 2 reactions

CivitAI

You are really a hero! If gguf for controllnet is on the way, you will be the God.
By the Q4 is better enough , latent composite is working. Very strong on style compose.

anduxAug 22, 2024

CivitAI

anyone found what divverent versions (k, s, etc) are about?

JayNLAug 22, 2024· 2 reactions

It's in the description
https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md#llama-3-8b-scoreboard

fronyaxAug 23, 2024· 2 reactions

@JayNL what are those number tells??

The smaller the Qnumber the more degradation in quality??

anduxAug 23, 2024

@fronyax short version, yes

tkpAug 23, 2024

CivitAI

Does the Q6.K model work with loras? It's just stopping dead on generating when I use them together, but I can't tell if I've just missed something.

tkpAug 23, 2024· 1 reaction

Ah, okay. I moved the model to the diffusion_models folder and that's solved it for me.

william_leungAug 25, 2024

"Diffusion in Low Bits" is set to "Automatic (fp16 LoRA)"

DearLuckSep 8, 2024

It definitely does work with loras, I am using it to fit more loras in VRAM

ALFARANKOAug 23, 2024· 4 reactions

CivitAI

its taking much more time than nf4 !!

eurotakuAug 24, 2024· 1 reaction

it's a compression technology, so it trades longer inference times for a size reduction

JayNLAug 25, 2024· 1 reaction

I guess that depends on your generation videocard, for the 40-series it's the same or even faster.

luchetesAug 29, 2024· 4 reactions

with a 3060 12gb, 136s on gguff, vs 99.95s on NF4

CruzFleshAug 31, 2024· 3 reactions

slower much compatible with loras and better prompt understanding

JayNLAug 26, 2024

CivitAI

It keeps skipping back to Q2.K, I wonder how much posts in Q2.K are accidentally posted there, quality seems too high, but I do see some with a lot(!) of steps.

DearLuckSep 15, 2024· 1 reaction

You are right, people post Q8 results in Q2

925_StudioSep 1, 2024· 5 reactions

CivitAI

1.39s/it (Q5S, Model+Clip)

1.37s/it (Q6K, Model+Clip)

1.06s/it (Q8, Model+Clip)

1.10it/s (Original, Model+Clip, fp16)

I was tested on my 4080S with ComfyUI. Does this mean the GGUF format only optimized the size, not the speed?

JayNLSep 1, 2024· 2 reactions

Yeah it's only about fitting in your VRAM, not about speed, I can run a batch of 2 in Q8 on a 4070, but a batch of 4 in Q4 (maybe more, didn't test).

925_StudioSep 1, 2024· 1 reaction

@JayNL Thank you! That is very clear.

443152Sep 2, 2024

CivitAI

anyone knows which is better? Q6K or FP8?

941052Sep 3, 2024· 1 reaction

it depends of what you mean with better. Everyone has different needs. If you want the most fastest and memory efficient version the NF4 version is the way to go.If you want the best output you need to pick the original fp16 weights. FP8 is in the half:is faster than compressed gguf but not in any scenarios it depends if you have 8gb or 12gb vram. Q8_0 should have a better output...but i think Q6_0 should be a better choiche because of this problem in early quantization: https://github.com/city96/ComfyUI-GGUF/issues/79

443152Sep 3, 2024

@sambaspo thanks for your anwer!

DearLuckSep 6, 2024

@Elysia_Saikou Having done quite a few attempts I would prefer Q6 over the default fp8, but this is mostly aesthetics.

443152Sep 7, 2024

@DearLuck after experimenting for a few days, I found that q6k suits me better as well. However, I'm currently using q8_0 to aim for higher accuracy. That said, FP8 seems to be the fastest among them.

SuzanneSep 5, 2024· 1 reaction

CivitAI

For me : 3070 RTX, 8G VRAM, 32 RAM, image = 896 X 1152 px

100%|██████████| 20/20 [01:13<00:00, 3.67s/it]

with : flux1-dev-Q8_0.gguf

is it good performance ? I don't know... 🤔

infernahermit846Sep 5, 2024· 1 reaction

speeds are the same, doesn't matter if you use Q2 or Q8, both generates around a bit over 1minute.

fhaifhaiSep 6, 2024· 1 reaction

try using a version that fits your VRAM, Q8 has 12 GB but your GPU only has 8 GB, maybe use a lower number

SuzanneSep 6, 2024· 1 reaction

@fhaifhai Q8 is great with 8Go Vram = 1min

fhaifhaiSep 6, 2024· 1 reaction

@Suzanne yeah, I actually get the same speed with RTX 4060 | 16GB and Q8:
100%|██████████████████| 31/31 [01:47<00:00, 3.47s/it]

706784Sep 24, 2024

Yes, same speeds with RTX 4060, 32 GB RAM and Q8, it hovers around 1:10 to 1:45 depending whether I use a LORA or not.

runebloodstoneOct 18, 2024

Yes that's fine.

Sanya_ArchitectNov 8, 2024

Same

Naks92Jan 28, 2025

@Suzanne DDR5 RAM?

Naks92Jan 28, 2025

I've got a 3070 8GB too but only getting 120s/it

kapec512Sep 9, 2024

CivitAI

Is "FLUX GGUF Q8" the best and most accurate Flux model these days or not?

RalFinger

Author

Sep 9, 2024· 2 reactions

I would say so, if you can run it, it is the best choice

Moppel127Sep 9, 2024· 1 reaction

https://www.reddit.com/r/StableDiffusion/comments/1fcuhsj/flux1_model_quants_levels_comparison_fp16_q8_0_q6/

LokiPoki1Sep 13, 2024

CivitAI

Does this work with Forge, or just ComfyUI?

RalFinger

Author

Sep 13, 2024· 3 reactions

Works with Forge :)

DlzezeSep 21, 2024

@RalFinger Just placing on models folder? Even with a different extension?

Haircut66Oct 26, 2024· 2 reactions

@Dlzeze all models go in the same model folder, including GGUF

papavicksSep 16, 2024

CivitAI

does this need a vae, clip?

JayNLSep 25, 2024

Search for GGUF and you find some workflows
https://civitai.com/search/models?modelType=Workflows&sortBy=models_v9&query=gguf

MaxiimusSep 25, 2024· 2 reactions

CivitAI

Really good! It'd be nice if this had LORA support tho, but still awesome regardless

adengroup2688Sep 28, 2024

CivitAI

I get this error on all models. CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

RalFinger

Author

Sep 28, 2024· 1 reaction

Found this topic on the git repo, hope that helps!

adengroup2688Sep 28, 2024

The problem was solved by reducing the VRAM frequency

adengroup2688Sep 28, 2024· 1 reaction

@RalFinger The problem was solved by reducing the VRAM frequency

adengroup2688Sep 28, 2024· 2 reactions

@RalFinger was able to create an image, the results were excellent. thank you

RalFinger

Author

Sep 28, 2024

@adengroup2688 glad to hear that! :)

kutoOct 5, 2024· 17 reactions

CivitAI

To get the loras to work for Q6k and Q8, you need to have the t5xxl_fp16.safetensors vae. and to set your Diffusion in Low Bits option to - Automatic (fp16 LORA)

HooChooOct 6, 2024· 1 reaction

I don't see where to set Diffusion in Low Bits? Where is this located

kutoOct 7, 2024

@HooChoo if your using forge ui or automatic 1111, it should be at the top

blhllNov 16, 2024· 1 reaction

Needed for some LORA's on Q5k as well, luckily they print it in console.

zz3870471May 6, 2025

god

uprz23736Oct 10, 2024

CivitAI

Is there a qint8 version? Not sure how to use this in comfy https://huggingface.co/Disty0/FLUX.1-dev-qint8

Sammy66Oct 25, 2024· 2 reactions

CivitAI

I love the images Q2.K gives, the so called higher quality images just don't give the more raw poor/messy/bad image style I like

bydm44Oct 27, 2024· 3 reactions

CivitAI

i tried Q4.KS and Q5 and bot of them are slow then fp8. dont understand this. gpu 3060 12gb

konservatorNov 4, 2024

did you solve the problem?

bydm44Nov 4, 2024· 3 reactions

i changed to this model flux1DevHyperNF4Flux1DevBNB. but i dont know which one. :) its okay and faster. its 8 step but with 12 step getting better result

alxenNov 3, 2024· 2 reactions

CivitAI

What about Q4_0 and Q4_1? any one know this?

blissiraMar 4, 2025

Q4_0 is tailored for limited VRAM for 4GB–7GB GPU

CannotBeRealFeb 15, 2025· 4 reactions

CivitAI

Q2 is an absolutely useless, messy model... you better download a pic from the web and then compress it asf, and it will look even better than a Q2 output

rafaelldestiloFeb 18, 2025· 2 reactions

CivitAI

It's running on an RTX 3060ti 8gb vram 32gb ram, and the speed is good, little more than the FN4, I never imagined it could run flux models with 8GB vram, better results than SDXL without a doubt

kintilian45May 9, 2025

I run flux1-dev on my RTX3060 6GB (no typo) without [serious] issues. Granted, I don't do any upscaling or Hires Fix with it. 😅

Checkpoint

Flux.1 D

by RalFinger

Download (Beta) View on CivitAI