Flux.1-Dev GGUF Q2.K Q3.KS Q4/Q4.1/Q4.KS Q5/Q5.1/Q5.KS Q6.K Q8

Flux.1-Dev GGUF Q2.K Q3.KS Q4/Q4.1/Q4.KS Q5/Q5.1/Q5.KS Q6.K Q8 - Q4.0

Source https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main by city96

This is a direct GGUF conversion of Flux.1-dev. As this is a quantized model not a finetune, all the same restrictions/original license terms still apply. Basic overview of quantization types.

The model files can be used with the ComfyUI-GGUF custom node.
Place model files in ComfyUI/models/unet - see the GitHub readme for further install instructions.
Also working with Forge since the latest commit!

☕ Buy me a coffee: https://ko-fi.com/ralfingerai
🍺 Join my discord: https://discord.com/invite/pAz4Bt3rqb

Description

FAQ

Comments (86)

Lera123Aug 15, 2024

CivitAI

They do apply on finetunes too (license restrictions)

psspsspsspssspssAug 15, 2024· 1 reaction

CivitAI

Does this support loras?

MeltyGarnetAug 15, 2024

No support for loras ro controlnets.

raidmachine132017712Aug 15, 2024

CivitAI

what about schnell versions so that we can use it commercially.

pigeliAug 15, 2024

CivitAI

I want to know if these improved models and merged models can be shared with control networks and small models such as Lora in the future. If they cannot be shared, it will be a huge and complex ecosystem

DD_Ai_artAug 15, 2024· 6 reactions

CivitAI

So, now we have a biblical proportions flood of flux "models", but, really almost no explanations of why, how, for what....

Some comparison, examples, etc will be very much needed.....

munchkinAug 15, 2024· 3 reactions

If you used LLMs, then you would know for what this is. Flux is a big model, so quantization reduces the cost, while preserving quality better than fp8 at least.
And I found comparisons under this Reddit post: https://www.reddit.com/r/StableDiffusion/comments/1eslcg0/excuse_me_gguf_quants_are_possible_on_flux_now/

DaemonratAug 15, 2024· 1 reaction

Take a look at the file size.

DD_Ai_artAug 15, 2024

@munchkin Thank you.... i guess this is not yet functional in Forge?

munchkinAug 15, 2024

@ddamir247931 They actually started adding support pretty quickly - you may use it in Forge soon, if not right now

DD_Ai_artAug 15, 2024

@munchkin i tried, everythig is there, but resulting image is just black screen. I probably did something wrong.

DarkAgentAug 15, 2024

I 100% agree with this visual comparisons with actual pictures made on the same seed would be VERY useful to SEE the difference

GRM80Aug 18, 2024

@ddamir247931 They work with forgeui, but the problem is that they eat the vram like crazy. I have a 12g gpu, and I can´t use Loras in forge. However, I experienced no problems using loras in comfy with flux models, but it is really slow when using loras.

EKKIVOKAug 15, 2024· 3 reactions

CivitAI

i made a XYZ plot comparaison with all those models on a simple prompt "seed locked" and.....the result is the same on each image the exact same no changes....i guess all thhose flux models are the same. and i still can't understand why..

GodAlMightyAug 15, 2024

CivitAI

image generation is super slow compared to the flux dev normal model? Is that normal?

s00shAug 15, 2024

Were you using Q5? From https://new.reddit.com/r/StableDiffusion/comments/1eso216/comparison_all_quants_we_have_so_far/ and my own testing, it looks like Q5 is slower for some reason.

GodAlMightyAug 15, 2024

no Q4

stduhpf893Aug 15, 2024· 1 reaction

CivitAI

So, just to be sure, the ComfyUI-GGUF node is just using gguf as a "compression" format, and de-quantizing the weights before computing, instead of actually using ggml for computation like stable-diffusion.cpp does?

plkAug 15, 2024· 2 reactions

That seems to be the case. Basically it's the closest you can get to the full dev model. The difference between this and NF4 is this still processes in FP16 I think, while NF4 breaks down the tensors into varying levels of precision, from init4 to FP8 and FP16 and even FP32, then uses processing tricks to fit multiple tensors into a single tensor (i.e. FP16 running two FP8 tensors at once, 4 init 4 tensors in an FP16 tensor, etc) which results in compression and speed up. People should use FP16 text encoder with this rather than FP8. That's the caveat. It shouldn't change the inference speed much if at all, and it will increase memory usage, but also definitely increases coherency and quality.

plkAug 15, 2024· 5 reactions

CivitAI

From my limited experimentation, NF4 is still putting out better results than Q4 when using the FP8 encoder, and NF4 is about 25% faster. NF4 + Realism LORA also definitely surpasses Q4 in fidelity. Another thing that seems to take a dive in Q4 is text coherency and skin textures. That doesn't mean Q4 is always bad, but it's almost superfluous compared to NF4. Q4 seems to at times have more detail like it's using a detailer LORA, but generally incoherent and unnecessary detail compared to NF4.

Note: The inference speed between FP8 and FP16 encoders should be virtually the same, so there's no reason to go with the FP8 version, as the FP16 encoder should provide better results, and that kind of gives it a boost over NF4 by clearing up some of the issues present when using the FP8 text encoder. The size difference doesn't seem to be an issue in Forge with a 12GB GPU and 32GB sysram.

TLDR;

Q4 + FP8 text encoder < NF4

Q4 + FP16 text encoder ≥ NF4

s00shAug 15, 2024· 3 reactions

CivitAI

Nice, thanks for the mirror! City96 just uploaded Q4_1 and Q5_1, FYI!

RalFinger

Author

Aug 15, 2024

Thank you for the information!

s00shAug 15, 2024· 12 reactions

CivitAI

GGUF models + LORAs are supported in Forge now as of this commit. Read on for a quick guide...

Run your update.bat or update Forge.

Download clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensors from https://huggingface.co/lllyasviel/flux_text_encoders/tree/main

Download ae.safetensors from https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

Put ae.safetensors and clip_l.safetensors in models/VAE folder

Put t5xxl_fp8_e4m3fn.safetensors in models/text_encoder.

In Forge, select your .gguf model, and in the new VAE/Text Encoder list, pick the three .safetensors files. I left Diffusion in Low Bits as Automatic and it worked for me.

RalFinger

Author

Aug 15, 2024

Thanks for the update, this is amazing!

psspsspsspssspssAug 15, 2024

I am getting errors on some loras:

"'Parameter' object has no attribute 'gguf_cls'"

Some loras work, but others get this error. Any idea?

DD_Ai_artAug 15, 2024

Tried. It load's normaly, begin generation, in preview window show picture, up to the 99%, and finished result is black image. Forge is updated. All files present and in directories.

psspsspsspssspssAug 15, 2024

@ddamir247931 Did you select all three: text encoder, clip and the ae in the VAE drop down?

psspsspsspssspssAug 15, 2024

Q8 works, but Q4_1 doesn't load in forge

DD_Ai_artAug 15, 2024

@psspsspsspssspss yes i did...and always image is plain black

s00shAug 16, 2024

@psspsspsspssspss Same for me. I stuck with Q4_0

EbenezerDanglewoodOct 10, 2024

I was getting black images with swap location set to Shared, but changing to CPU fixed it.

NoobFromEgyptAug 15, 2024

CivitAI

Can someone explain what should i download with my RTX 3070 8 VRAM and 32 RAM

DD_Ai_artAug 15, 2024· 1 reaction

More Vram...

Kidding....really no difference, at least i didn't notice any with my 4060. Of course, i didn't try every possible combination but..

Schnell is fastest, lowest quality, rest is all the same, give or take and that is very slow.

NoobFromEgyptAug 15, 2024

@ddamir247931 Thank You .. but all of them is dev Q4 and Q4.1 Q5

DD_Ai_artAug 15, 2024

@NoobFromEgypt have no luck with these models, it don't work for me, for some reason...so i really don't know if there are faster or less vram demanding

MirabilisAug 16, 2024

I'm on a 3070Ti 8GB + 32 GB system RAM and honestly having downloaded the Q8 model and tried it out (you need to load the Ae, and F8 and I clip in the Vae with the Q models) I'd say that the NF4 V2 model still delivers a better result in terms of prompt adherence in my view.

GUS_C3Aug 15, 2024

CivitAI

When using Q4.1 or Q5.1:

Traceback (most recent call last): File "D:\AI\webui_forge_cu121_torch21\webui\modules_forge\main_thread.py", line 30, in work self.result = self.func(*self.args, **self.kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\modules\txt2img.py", line 110, in txt2img_function processed = processing.process_images(p) File "D:\AI\webui_forge_cu121_torch21\webui\modules\processing.py", line 809, in process_images res = process_images_inner(p) File "D:\AI\webui_forge_cu121_torch21\webui\modules\processing.py", line 952, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File "D:\AI\webui_forge_cu121_torch21\webui\modules\processing.py", line 1323, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_kdiffusion.py", line 234, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_common.py", line 272, in launch_sampling return func() File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_kdiffusion.py", line 234, in <lambda> samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\k_diffusion\sampling.py", line 128, in sample_euler denoised = model(x, sigma_hat * s_in, **extra_args) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_cfg_denoiser.py", line 186, in forward denoised, cond_pred, uncond_pred = sampling_function(self, denoiser_params=denoiser_params, cond_scale=cond_scale, cond_composition=cond_composition) File "D:\AI\webui_forge_cu121_torch21\webui\backend\sampling\sampling_function.py", line 339, in sampling_function denoised, cond_pred, uncond_pred = sampling_function_inner(model, x, timestep, uncond, cond, cond_scale, model_options, seed, return_full=True) File "D:\AI\webui_forge_cu121_torch21\webui\backend\sampling\sampling_function.py", line 284, in sampling_function_inner cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond_, x, timestep, model_options) File "D:\AI\webui_forge_cu121_torch21\webui\backend\sampling\sampling_function.py", line 254, in calc_cond_uncond_batch output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks) File "D:\AI\webui_forge_cu121_torch21\webui\backend\modules\k_model.py", line 45, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float() File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 402, in forward out = self.inner_forward(img, img_ids, context, txt_ids, timestep, y, guidance) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 373, in inner_forward img, txt = block(img=img, txt=txt, vec=vec, pe=pe) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 191, in forward img_mod1_shift, img_mod1_scale, img_mod1_gate, img_mod2_shift, img_mod2_scale, img_mod2_gate = self.img_mod(vec) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 161, in forward out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\operations.py", line 371, in forward return functional_linear_gguf(x, self.weight, self.bias) File "D:\AI\webui_forge_cu121_torch21\webui\backend\operations_gguf.py", line 51, in functional_linear_gguf return torch.nn.functional.linear(x, weight, bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3072 and 2304x18432) mat1 and mat2 shapes cannot be multiplied (1x3072 and 2304x18432)

Q8 is the only one that works

fablegeniusAug 15, 2024

CivitAI

looking for a workflow for this in comfy. is this file all I need or are there a bunch of other components?

JayNLAug 16, 2024· 1 reaction

https://civitai.com/models/650990

CreepybitAug 16, 2024

CivitAI

Do you have to download all these different checkpoints?

ArtificeAIAug 16, 2024

No, just choose one. Look at the filesize and pick the one that doesn't exceed your GPU's VRAM.

JayNLAug 16, 2024

@ArtificeAI but is 8 better than 4? Well I will see in a moment...

CreepybitAug 16, 2024

@ArtificeAI Alright, well I just downloaded the full dev checkpoint (22GB) and I generate 5 s/it on it, so I guess I'll stick to that one.

turkinoAug 16, 2024

@JayNL 8 should be less compressed than 4, so should be somewhat better.

SpaykeAug 16, 2024

CivitAI

Which one is fastest? are they faster than NF4v2?

jason_chan_12790Aug 16, 2024

slower, and could exceed 8gb vram just tested on my 4060, so it seems gguf is kinda pointless for those with lower spec machines

[UPDATE] it seems the ggfu node has been updated, so vram is no longer an issue. speedwise, it's around 15-20% slower than nf4

JayNLAug 16, 2024· 3 reactions

I have a 4070 and it's extremely fast, I use Q4, no idea if 8 is better, I shared the workflow
https://civitai.com/models/650990

JayNLAug 16, 2024

CivitAI

So I tried Q4, but why is it in front, isn't 8 better?

WilddogsAug 17, 2024

The higher the quant the better the model, at least that is how it works with LLM GGUF's, so I will assume it is the same method here.

JayNLAug 17, 2024

@Wilddogs yeah I already spammed Q8 full of images yesterday lol, I thought it were versions like v1, v2... but they're variations, that's why they're in this order, Q8 runs fine on 12GB and almost as fast as Q4!

kenpmAug 18, 2024

@JayNL I agree that Q8 should be better than Q4 but I've been doing some comparison testing between them and while there are some slight compositional differences with the same seed, there doesn't appear to be any difference in quality that I can tell.

JayNLAug 18, 2024

@kenpm true, Q4 is just really good already, I just use Q8 because I can, but isn't really needed, I still have both installed, sometimes Q4 comes out better even

JayNLAug 16, 2024· 3 reactions

CivitAI

I'm generating 2 images in like a minute with an RTX 4070 and t5xxl_fp16 as clip, delivering photo quality, how is nobody using this?

OzzyOsmanAug 16, 2024

Do the GGUF versions work with Flux LORAs?

OzzyOsmanAug 16, 2024· 1 reaction

Can you share the link to your t5-xxl-encoder link? Is it this? AUTOMATIC/stable-diffusion-3-medium-text-encoders at main (huggingface.co )

JayNLAug 16, 2024

@OzzyOsman no it's like NF4 (quantized, don't ask me what that means), but the info on Huggingface says so.

ThatrandomjewAug 17, 2024

@JayNL The comfyui node says that LoRA support is experimental, but there

quintessentialfo4180Aug 16, 2024

CivitAI

Tested. Identical system performance to q8: 13-14 seconds per image at 23 steps of Euler/Beta on a 4090. However q4's quality is about 50% degraded judging by the number of text-spelling hits/misses when comparing the two. :-/

The file size is smaller though obviously. I would love to give this some positive conclusion.

JayNLAug 16, 2024

I'm doing like 30 secs per image on a 4070, is there an advantage to getting Q8?

OzzyOsmanAug 16, 2024

@JayNL GGUFs are like LLMs, the higher the quantization the better the performance. Q8 would offer higher quality renders compared to Q4 or Q6. But, the larger quants require more vram.

JayNLAug 16, 2024

CivitAI

I made a quick workflow here
https://civitai.com/models/650990

It works, but I have no idea if this is the right way though, on huggingface I also see other clips being used, vit.bin?

ParamindAug 16, 2024

CivitAI

Hey there. How can this Q8 version of the model be 11.34GB and therefore 1.5GB smaller than the original upload on huggingface, which is 12.7GB in size? Does it mean that it needs less vram? And is there a quality loss?

JayNLAug 17, 2024

The size estimation of Civitai is always lower, if you download them they are both 12.410.432KB guess they calculate bits and bytes different, who is right? 😏

JayNLAug 17, 2024

It has something to do with dividing by 1024, look it up, interesting stuff, right click on a folder in Windows and you see that 20GB is not 20.000.000.000 bytes

akshaydixit007Aug 16, 2024· 2 reactions

CivitAI

Amazing high speed for my RTX 3060 6gb Laptop.. for 20 steps euler takes_106 sec-- Q4 model

JayNLAug 16, 2024· 1 reaction

Yeah I was just saying in the S model, probably 99% can just go to Dev and start with Q4 and see how far they can go, I thought you need like a 4080 16GB for this, but Q8 runs smooth and fast even on 12GB.

akshaydixit007Aug 16, 2024

@JayNL Thanks for Tip :) i will try that

thermonuclear777Aug 17, 2024

CivitAI

I am using RTX 2060 12GB,paired with 64GB DD4 system RAM, and Core i5 12400F processor.

But for some reason I am getting 300~600s/it on all GGUF quant models, what am I doing wrong? Plz help....

P.S: NF4 V2 takes 110s per image of 832x1216 image

I am also using t5_fp8_em4 and clip_l with GGUF Unet-model

https://freeimage.host/i/dGgde6P

sansmoraxzAug 17, 2024

Maybe simply your workflow. You have too many things running and your GPU is overcommitted and leaking into shared DRAM space.

JayNLAug 17, 2024

I hear a lot of people that don't get it running on 12GB, while I run it easy on 4070 12GB, but maybe the 2060 has GDDR6 and not GDDR6X? No idea if that matters. Use the simplest workflow possible.

JayNLAug 17, 2024

I run 2 images on 960x1280 with 5.5s/it and I have 5 Twitch streams open, Discord, Steam, some other shits
https://ibb.co/z7zyT9Q

thermonuclear777Aug 17, 2024

@JayNL I have disable all my nodes except GGUF loader and the problem still persists,

bignut022793Aug 19, 2024

did you find any solution?

mirek190Aug 19, 2024

Do not use Q5 as is the slowest version

Try q4 instead if you can't use q8.

thermonuclear777Aug 22, 2024

@bignut022793 After updating the GGUF node, I am consistently getting between 4~7s/it on all GGUF quantization models, even Q8 is not causing any OOM now, I am even using Q8 quantized version T5_Fp16 clip and it is working fine now

925_StudioAug 17, 2024· 3 reactions

CivitAI

In my case, the only benefit of the Q8 version is the half size. The quality is very close compared to the original but the speed is slightly slower. 1024 x 1024, 4 pics in 1 batch. Q8 took 100 seconds and the original took 91 seconds. Both are 20 steps on 4080S. Anyway, thanks for sharing these.

PirateGirlAug 18, 2024· 4 reactions

CivitAI

I tested three FD models today and then I found out about this:

FLUX.1 [dev]: This model falls under the FLUX.1 [dev] Non-Commercial License. This license restricts commercial use, allowing only non-commercial purposes, such as personal, scientific, and artistic uses.

If a miracle happens and I can make a few yen by selling my images somehow, it can't be done with the FD model. It will probably never happen, but it is a matter of principles. So I delete all the FD models I have tried and never touch any FD model again.

HoliwayzAug 19, 2024· 2 reactions

This licence is a mess for sure, from what I understand YOU can use the outputs if it is not to train a commercial model. But two sections are contradictory so ...

It says this: "“Non-Commercial Purpose means any of the following uses, but only so far as you do not receive any direct or indirect payment arising from the use of the model or its output"

“Outputs” means any content generated by the operation of the FLUX.1 [dev] Models or the Derivatives from a prompt

So you'll assume you don't have the right to sell the generated pictures ... But it also says :

Outputs. We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model.

Where it clearly says: You may use Output for any purpose (including for commercial purposes)

Blackforest needs to be more clear.

JayNLAug 19, 2024

@Kyra31 I have an occasional sale on DeviantArt, but I'm afraid to put some work made with the GGUF Dev version for sale there 💩

homoludensAug 19, 2024

@Kyra31 Just curious what do you think about competitive models. What means "competitive model"? With similar amount of parameters (12B)? SDXL/Pony is not competitive? I wounder...

4489695Aug 20, 2024· 1 reaction

So just say it was made with SDXL and you're fine. Who can tell? They won't care.

HoliwayzAug 20, 2024

@JayNL From what I understood from lawyers you shall not use Dev outputs without acquiring a licence from Black Forest period. :(

JayNLAug 19, 2024

CivitAI

I have this great node Save Image With Metadata, but it doesn't work with GGUF, probably because it loads a Unet instead of a Checkpoint, and that's missing in the end? Anyone has an alternative? 🙏

homoludensAug 20, 2024

I am sad too... Filter by Metadata is turned on on my side always and now my images posted without metadata... That's sucksassful... :( I didn't find yet a node who adds metadata.

JayNLAug 20, 2024

@homoludens same, have been searching and trying for hours, but doesn't work, now it's all empty resources

antonioccostajr881Aug 19, 2024

CivitAI

Can I use python diffusers to run this model using code lines?

Checkpoint

Flux.1 D

by RalFinger

Download (Beta) View on CivitAI

gguf

base model

flux