Flux Blockwise NF4 (Full Checkpoint - 22GB)

Flux Blockwise NF4 (Full Checkpoint - 22GB) - Schnell_Full_Checkpoint

Blockwise NF4 (Full Checkpoint - 22GB)

In forge use Automatic FP16 Lora not NF4 or NF4 Automatic
Recommend to use for FORGE set COMMANDLINE_ARGS= --unet-in-bf16 --vae-in-fp32
Full checkpoint DO NOT LOAD additional TE, VAE or CLIP
NO changes have been made to the Blackforest base diffusion model, other then mixed precision quantization.

This model is likely the first of its kind combining the NF4 quantization with Blackforest recommendation to not quantize TE blocks.

High accuracy and speed while still fitting under 24GB (Works well in 16GB and 8GB cards also)

Description

FAQ

Comments (19)

RedPinkRetroDec 1, 2024

CivitAI

Does your approach change anything about the unfortunate LoRA and ControlNet issues of NF4 versions?

Felldude

Author

Dec 1, 2024

Lora's work fully in my testing, ControlNet the issue is with the base model being distilled is it not

RedPinkRetroDec 1, 2024

@Felldude LoRAs threw errors immediately in Comfy, ControlNet too and IPAdapter just didn't affect it at all. That's how all NF4 models I tested behaved, but I didn't use one in a while as I have not heard/read about any changes there. That's why I asked hoping this might be a change making a difference by keeping some blocks untouched.
As far as I remember NF4 had a somewhat flexible self-rescaling mechanic going on that allowed it to be as precise as necessary, while being as efficient as possible, which made it perform faster and better than a quantized counterpart. But unfortunately that made it structurally incompatible to LoRA and ControlNets being built with a set scale.

Felldude

Author

Dec 2, 2024· 2 reactions

@redpinkretro To my knowledge NF4 support for comfy was dropped so you can only load base models with no lora support - Forge works fine with NF4 and LORA as it up-cast the model to FP32/FP16 to allow for loras - for FLUX this really should be BF16 like comfy but I would rather have FP16 and working lora's then no

Felldude

Author

Dec 2, 2024

https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4

RedPinkRetroDec 3, 2024

@Felldude Yes, exactly. That's why I was not using any NF4 models, due to those limitations.

akdcpkDec 3, 2024· 1 reaction

CivitAI

This is an interesting checkpoint. It is very fast, has a basic workflow, and it's reasonably accurate and detailed. All those qualities make this model very accessible.

Felldude

Author

Dec 3, 2024

Thanks

akdcpkDec 3, 2024

CivitAI

I don't know if it makes a difference, but I used these switches when starting comfyui for this model:

--bf16-unet --fp32-vae

Felldude

Author

Dec 3, 2024

Those instructions would be for forge, for comfy I think you need the GPU only command

mahkidale496Dec 3, 2024

CivitAI

How much faster, than base Nf4 per steps?

Felldude

Author

Dec 3, 2024

That would very per machine as the CPU load is higher with this model unless you force it into GPU (Assuming a 3090 or 4090) the accuracy is higher and for me the speed is similar

mahkidale496Dec 3, 2024

So about same speed, but 4 steps. I have 8gb Vram, and tried Nf4 20 steps before, but quality was shit, and had no time to try, what worked best.

Can you give me an example prompt, what should i use for best realism. Just quality tags, common tags.

Felldude

Author

Dec 3, 2024· 1 reaction

@mahkidale496 Describe in full natural language, Some people a add something like "A raw photo of a snow man during Christmas on a winters night"

mahkidale496Dec 3, 2024

@Felldude Thanx, does it support lora in Forge?

Felldude

Author

Dec 3, 2024· 1 reaction

@mahkidale496 Forge does yes, just use Automatic FP16 Lora

mahkidale496Dec 3, 2024

@Felldude Stilll euler, beta? Or there is better now.

Felldude

Author

Dec 3, 2024· 1 reaction

@mahkidale496 forge has its own, but euler still works well

MescalambaOct 9, 2025

CivitAI

Hm, so other NF4 quantized TE blocks (as TE blocks inside diffusion model?).

I dont mean CLIP/T5, that obviously shouldnt be quantized in NF4 (well, it works if its HQQ).

Checkpoint

Flux.1 S

by Felldude

Download (Beta) View on CivitAI