Source https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main by city96
This is a direct GGUF conversion of Flux.1-dev. As this is a quantized model not a finetune, all the same restrictions/original license terms still apply. Basic overview of quantization types.
The model files can be used with the ComfyUI-GGUF custom node.
Place model files in ComfyUI/models/unet - see the GitHub readme for further install instructions.
Also working with Forge since the latest commit!
☕ Buy me a coffee: https://ko-fi.com/ralfingerai
🍺 Join my discord: https://discord.com/invite/pAz4Bt3rqb
Description
FAQ
Comments (86)
They do apply on finetunes too (license restrictions)
Does this support loras?
No support for loras ro controlnets.
what about schnell versions so that we can use it commercially.
I want to know if these improved models and merged models can be shared with control networks and small models such as Lora in the future. If they cannot be shared, it will be a huge and complex ecosystem
So, now we have a biblical proportions flood of flux "models", but, really almost no explanations of why, how, for what....
Some comparison, examples, etc will be very much needed.....
If you used LLMs, then you would know for what this is. Flux is a big model, so quantization reduces the cost, while preserving quality better than fp8 at least.
And I found comparisons under this Reddit post: https://www.reddit.com/r/StableDiffusion/comments/1eslcg0/excuse_me_gguf_quants_are_possible_on_flux_now/
Take a look at the file size.
@munchkin Thank you.... i guess this is not yet functional in Forge?
@ddamir247931 They actually started adding support pretty quickly - you may use it in Forge soon, if not right now
@munchkin i tried, everythig is there, but resulting image is just black screen. I probably did something wrong.
I 100% agree with this visual comparisons with actual pictures made on the same seed would be VERY useful to SEE the difference
@ddamir247931 They work with forgeui, but the problem is that they eat the vram like crazy. I have a 12g gpu, and I can´t use Loras in forge. However, I experienced no problems using loras in comfy with flux models, but it is really slow when using loras.
i made a XYZ plot comparaison with all those models on a simple prompt "seed locked" and.....the result is the same on each image the exact same no changes....i guess all thhose flux models are the same. and i still can't understand why..
image generation is super slow compared to the flux dev normal model? Is that normal?
Were you using Q5? From https://new.reddit.com/r/StableDiffusion/comments/1eso216/comparison_all_quants_we_have_so_far/ and my own testing, it looks like Q5 is slower for some reason.
no Q4
So, just to be sure, the ComfyUI-GGUF node is just using gguf as a "compression" format, and de-quantizing the weights before computing, instead of actually using ggml for computation like stable-diffusion.cpp does?
That seems to be the case. Basically it's the closest you can get to the full dev model. The difference between this and NF4 is this still processes in FP16 I think, while NF4 breaks down the tensors into varying levels of precision, from init4 to FP8 and FP16 and even FP32, then uses processing tricks to fit multiple tensors into a single tensor (i.e. FP16 running two FP8 tensors at once, 4 init 4 tensors in an FP16 tensor, etc) which results in compression and speed up. People should use FP16 text encoder with this rather than FP8. That's the caveat. It shouldn't change the inference speed much if at all, and it will increase memory usage, but also definitely increases coherency and quality.
From my limited experimentation, NF4 is still putting out better results than Q4 when using the FP8 encoder, and NF4 is about 25% faster. NF4 + Realism LORA also definitely surpasses Q4 in fidelity. Another thing that seems to take a dive in Q4 is text coherency and skin textures. That doesn't mean Q4 is always bad, but it's almost superfluous compared to NF4. Q4 seems to at times have more detail like it's using a detailer LORA, but generally incoherent and unnecessary detail compared to NF4.
Note: The inference speed between FP8 and FP16 encoders should be virtually the same, so there's no reason to go with the FP8 version, as the FP16 encoder should provide better results, and that kind of gives it a boost over NF4 by clearing up some of the issues present when using the FP8 text encoder. The size difference doesn't seem to be an issue in Forge with a 12GB GPU and 32GB sysram.
TLDR;
Q4 + FP8 text encoder < NF4
Q4 + FP16 text encoder ≥ NF4
Nice, thanks for the mirror! City96 just uploaded Q4_1 and Q5_1, FYI!
Thank you for the information!
GGUF models + LORAs are supported in Forge now as of this commit. Read on for a quick guide...
Run your update.bat or update Forge.
Download clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensors from https://huggingface.co/lllyasviel/flux_text_encoders/tree/main
Download ae.safetensors from https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main
Put ae.safetensors and clip_l.safetensors in models/VAE folder
Put t5xxl_fp8_e4m3fn.safetensors in models/text_encoder.
In Forge, select your .gguf model, and in the new VAE/Text Encoder list, pick the three .safetensors files. I left Diffusion in Low Bits as Automatic and it worked for me.
Thanks for the update, this is amazing!
I am getting errors on some loras:
"'Parameter' object has no attribute 'gguf_cls'"
Some loras work, but others get this error. Any idea?
Tried. It load's normaly, begin generation, in preview window show picture, up to the 99%, and finished result is black image. Forge is updated. All files present and in directories.
@ddamir247931 Did you select all three: text encoder, clip and the ae in the VAE drop down?
Q8 works, but Q4_1 doesn't load in forge
@psspsspsspssspss yes i did...and always image is plain black
@psspsspsspssspss Same for me. I stuck with Q4_0
I was getting black images with swap location set to Shared, but changing to CPU fixed it.
Can someone explain what should i download with my RTX 3070 8 VRAM and 32 RAM
More Vram...
Kidding....really no difference, at least i didn't notice any with my 4060. Of course, i didn't try every possible combination but..
Schnell is fastest, lowest quality, rest is all the same, give or take and that is very slow.
@ddamir247931 Thank You .. but all of them is dev Q4 and Q4.1 Q5
@NoobFromEgypt have no luck with these models, it don't work for me, for some reason...so i really don't know if there are faster or less vram demanding
I'm on a 3070Ti 8GB + 32 GB system RAM and honestly having downloaded the Q8 model and tried it out (you need to load the Ae, and F8 and I clip in the Vae with the Q models) I'd say that the NF4 V2 model still delivers a better result in terms of prompt adherence in my view.
When using Q4.1 or Q5.1:
Traceback (most recent call last): File "D:\AI\webui_forge_cu121_torch21\webui\modules_forge\main_thread.py", line 30, in work self.result = self.func(*self.args, **self.kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\modules\txt2img.py", line 110, in txt2img_function processed = processing.process_images(p) File "D:\AI\webui_forge_cu121_torch21\webui\modules\processing.py", line 809, in process_images res = process_images_inner(p) File "D:\AI\webui_forge_cu121_torch21\webui\modules\processing.py", line 952, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File "D:\AI\webui_forge_cu121_torch21\webui\modules\processing.py", line 1323, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_kdiffusion.py", line 234, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_common.py", line 272, in launch_sampling return func() File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_kdiffusion.py", line 234, in <lambda> samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\k_diffusion\sampling.py", line 128, in sample_euler denoised = model(x, sigma_hat * s_in, **extra_args) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\modules\sd_samplers_cfg_denoiser.py", line 186, in forward denoised, cond_pred, uncond_pred = sampling_function(self, denoiser_params=denoiser_params, cond_scale=cond_scale, cond_composition=cond_composition) File "D:\AI\webui_forge_cu121_torch21\webui\backend\sampling\sampling_function.py", line 339, in sampling_function denoised, cond_pred, uncond_pred = sampling_function_inner(model, x, timestep, uncond, cond, cond_scale, model_options, seed, return_full=True) File "D:\AI\webui_forge_cu121_torch21\webui\backend\sampling\sampling_function.py", line 284, in sampling_function_inner cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond_, x, timestep, model_options) File "D:\AI\webui_forge_cu121_torch21\webui\backend\sampling\sampling_function.py", line 254, in calc_cond_uncond_batch output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks) File "D:\AI\webui_forge_cu121_torch21\webui\backend\modules\k_model.py", line 45, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float() File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 402, in forward out = self.inner_forward(img, img_ids, context, txt_ids, timestep, y, guidance) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 373, in inner_forward img, txt = block(img=img, txt=txt, vec=vec, pe=pe) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 191, in forward img_mod1_shift, img_mod1_scale, img_mod1_gate, img_mod2_shift, img_mod2_scale, img_mod2_gate = self.img_mod(vec) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\nn\flux.py", line 161, in forward out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "D:\AI\webui_forge_cu121_torch21\webui\backend\operations.py", line 371, in forward return functional_linear_gguf(x, self.weight, self.bias) File "D:\AI\webui_forge_cu121_torch21\webui\backend\operations_gguf.py", line 51, in functional_linear_gguf return torch.nn.functional.linear(x, weight, bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3072 and 2304x18432) mat1 and mat2 shapes cannot be multiplied (1x3072 and 2304x18432)
Q8 is the only one that works
looking for a workflow for this in comfy. is this file all I need or are there a bunch of other components?
Do you have to download all these different checkpoints?
No, just choose one. Look at the filesize and pick the one that doesn't exceed your GPU's VRAM.
@ArtificeAI but is 8 better than 4? Well I will see in a moment...
@ArtificeAI Alright, well I just downloaded the full dev checkpoint (22GB) and I generate 5 s/it on it, so I guess I'll stick to that one.
@JayNL 8 should be less compressed than 4, so should be somewhat better.
Which one is fastest? are they faster than NF4v2?
slower, and could exceed 8gb vram just tested on my 4060, so it seems gguf is kinda pointless for those with lower spec machines
[UPDATE] it seems the ggfu node has been updated, so vram is no longer an issue. speedwise, it's around 15-20% slower than nf4
I have a 4070 and it's extremely fast, I use Q4, no idea if 8 is better, I shared the workflow
https://civitai.com/models/650990
So I tried Q4, but why is it in front, isn't 8 better?
The higher the quant the better the model, at least that is how it works with LLM GGUF's, so I will assume it is the same method here.
@Wilddogs yeah I already spammed Q8 full of images yesterday lol, I thought it were versions like v1, v2... but they're variations, that's why they're in this order, Q8 runs fine on 12GB and almost as fast as Q4!
@JayNL I agree that Q8 should be better than Q4 but I've been doing some comparison testing between them and while there are some slight compositional differences with the same seed, there doesn't appear to be any difference in quality that I can tell.
@kenpm true, Q4 is just really good already, I just use Q8 because I can, but isn't really needed, I still have both installed, sometimes Q4 comes out better even
I'm generating 2 images in like a minute with an RTX 4070 and t5xxl_fp16 as clip, delivering photo quality, how is nobody using this?
Do the GGUF versions work with Flux LORAs?
Can you share the link to your t5-xxl-encoder link? Is it this? AUTOMATIC/stable-diffusion-3-medium-text-encoders at main (huggingface.co)
@OzzyOsman no it's like NF4 (quantized, don't ask me what that means), but the info on Huggingface says so.
@JayNL The comfyui node says that LoRA support is experimental, but there
Tested. Identical system performance to q8: 13-14 seconds per image at 23 steps of Euler/Beta on a 4090. However q4's quality is about 50% degraded judging by the number of text-spelling hits/misses when comparing the two. :-/
The file size is smaller though obviously. I would love to give this some positive conclusion.
I'm doing like 30 secs per image on a 4070, is there an advantage to getting Q8?
@JayNL GGUFs are like LLMs, the higher the quantization the better the performance. Q8 would offer higher quality renders compared to Q4 or Q6. But, the larger quants require more vram.
I made a quick workflow here
https://civitai.com/models/650990
It works, but I have no idea if this is the right way though, on huggingface I also see other clips being used, vit.bin?
Hey there. How can this Q8 version of the model be 11.34GB and therefore 1.5GB smaller than the original upload on huggingface, which is 12.7GB in size? Does it mean that it needs less vram? And is there a quality loss?
The size estimation of Civitai is always lower, if you download them they are both 12.410.432KB guess they calculate bits and bytes different, who is right? 😏
It has something to do with dividing by 1024, look it up, interesting stuff, right click on a folder in Windows and you see that 20GB is not 20.000.000.000 bytes
Amazing high speed for my RTX 3060 6gb Laptop.. for 20 steps euler takes_106 sec-- Q4 model
Yeah I was just saying in the S model, probably 99% can just go to Dev and start with Q4 and see how far they can go, I thought you need like a 4080 16GB for this, but Q8 runs smooth and fast even on 12GB.
@JayNL Thanks for Tip :) i will try that
I am using RTX 2060 12GB,paired with 64GB DD4 system RAM, and Core i5 12400F processor.
But for some reason I am getting 300~600s/it on all GGUF quant models, what am I doing wrong? Plz help....
P.S: NF4 V2 takes 110s per image of 832x1216 image
I am also using t5_fp8_em4 and clip_l with GGUF Unet-model
Maybe simply your workflow. You have too many things running and your GPU is overcommitted and leaking into shared DRAM space.
I hear a lot of people that don't get it running on 12GB, while I run it easy on 4070 12GB, but maybe the 2060 has GDDR6 and not GDDR6X? No idea if that matters. Use the simplest workflow possible.
I run 2 images on 960x1280 with 5.5s/it and I have 5 Twitch streams open, Discord, Steam, some other shits
https://ibb.co/z7zyT9Q
@JayNL I have disable all my nodes except GGUF loader and the problem still persists,
did you find any solution?
Do not use Q5 as is the slowest version
Try q4 instead if you can't use q8.
@bignut022793 After updating the GGUF node, I am consistently getting between 4~7s/it on all GGUF quantization models, even Q8 is not causing any OOM now, I am even using Q8 quantized version T5_Fp16 clip and it is working fine now
In my case, the only benefit of the Q8 version is the half size. The quality is very close compared to the original but the speed is slightly slower. 1024 x 1024, 4 pics in 1 batch. Q8 took 100 seconds and the original took 91 seconds. Both are 20 steps on 4080S. Anyway, thanks for sharing these.
I tested three FD models today and then I found out about this:
FLUX.1 [dev]: This model falls under the FLUX.1 [dev] Non-Commercial License. This license restricts commercial use, allowing only non-commercial purposes, such as personal, scientific, and artistic uses.
If a miracle happens and I can make a few yen by selling my images somehow, it can't be done with the FD model. It will probably never happen, but it is a matter of principles. So I delete all the FD models I have tried and never touch any FD model again.
This licence is a mess for sure, from what I understand YOU can use the outputs if it is not to train a commercial model. But two sections are contradictory so ...
It says this: "“Non-Commercial Purpose means any of the following uses, but only so far as you do not receive any direct or indirect payment arising from the use of the model or its output"
“Outputs” means any content generated by the operation of the FLUX.1 [dev] Models or the Derivatives from a prompt
So you'll assume you don't have the right to sell the generated pictures ... But it also says :
Outputs. We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model.
Where it clearly says: You may use Output for any purpose (including for commercial purposes)
Blackforest needs to be more clear.
@Kyra31 I have an occasional sale on DeviantArt, but I'm afraid to put some work made with the GGUF Dev version for sale there 💩
@Kyra31 Just curious what do you think about competitive models. What means "competitive model"? With similar amount of parameters (12B)? SDXL/Pony is not competitive? I wounder...
So just say it was made with SDXL and you're fine. Who can tell? They won't care.
@JayNL From what I understood from lawyers you shall not use Dev outputs without acquiring a licence from Black Forest period. :(
I have this great node Save Image With Metadata, but it doesn't work with GGUF, probably because it loads a Unet instead of a Checkpoint, and that's missing in the end? Anyone has an alternative? 🙏
I am sad too... Filter by Metadata is turned on on my side always and now my images posted without metadata... That's sucksassful... :( I didn't find yet a node who adds metadata.
@homoludens same, have been searching and trying for hours, but doesn't work, now it's all empty resources
Can I use python diffusers to run this model using code lines?
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.
