NF4 is significantly faster and more memory-efficient than FP8 due to its use of native bnb.matmul_4bit, which avoids casting and leverages low-bit CUDA tricks. It achieves better numerical precision and dynamic range by storing weights in multiple tensors of varying precisions, unlike FP8's single-tensor approach
the list of all (nf4) models (lora not supported yet)
https://huggingface.co/silveroxides/flux1-nf4-weights/tree/main
what you have to do just
go to ComfyUI/custom_nodes/
then (for full checkpoint)
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
git clone https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
or for unet only
git clone https://github.com/DenkingOfficial/ComfyUI_UNet_bitsandbytes_NF4.git
and then run this workflow in comfyui
"more advance" workflow "just disable checkpoint and activate NF4 checkpoint loader"
for flux resolution
in custom node , or cancel it
Description
FAQ
Comments (3)
If you get errors from the NF4 node then make sure your comfy is updated and that your bitsandbytes library is at least 0.43 or higher.
It works in Forge now, too.
already comfy using their script

