NF4 and FP8 Checkpoints:
NF4 Checkpoint (flux1-dev-bnb-nf4.safetensors): Optimized for performance with speed improvements ranging from 1.3x to 4x compared to FP8, depending on the GPU and software setup. NF4 is now the recommended format for most users with compatible GPUs (RTX 3XXX/4XXX series).
FP8 Checkpoint (flux1-dev-fp8.safetensors): Provided as an alternative for older GPUs that do not support NF4.
Performance Improvements:
NF4 demonstrates faster inference speeds and reduced memory usage compared to FP8, making it highly efficient for image diffusion tasks.
The NF4 utilizes multiple tensor precisions (float32, float16, uint8, int4) to achieve higher numerical precision and dynamic range, outperforming FP8 in most scenarios.
Distilled CFG Guidance:
Flux-dev now includes distilled model guidance, where CFG is recommended to be set at 1 with a new "Distilled CFG Guidance" feature set at 3.5 for optimal results. Negative prompts are discouraged in this setup.
Installation and Usage Instructions(ComfyUI):
NF4-BnB Node is now available at ComfyUI Manager. Need to select 'Channel: dev' in order to find it.
1. Clone the Official Node:
Clone the official ComfyUI NF4 Node repository into thecustom_nodesfolder using the following command:git clone https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4.git2. Install Dependency:
Activate Virtual Environment(venv).
Install or update the
bitsandbytespackage with the following command:pip install -U bitsandbytes
3. Update Requirement: Ensure your ComfyUI is up to date.
4. Node Setup: Use the
CheckpointLoaderNF4node in your Flux workflow, replacing the regular checkpoint loader node.Workflow containing NF4 node included in post files. (.zip json - training data)
For ComfyUI Portable Version:
1. Ensure your ComfyUI is up to date and that the NF4 Node is placed inside the
custom_nodesdirectory.2. From the
[comfy install dir]directory, run the following command to install the required packages:python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4\requirements.txt
This command should be executed at the root of the ComfyUI directory for the portable version.
Compatibility: Users with older GPUs or specific setups (like GTX 10XX/20XX) are cautioned to use the FP8 checkpoint, as their devices may not support NF4. Additionally, loading FP8 checkpoints with NF4 options can lead to unnecessary delays and reduced quality.
Credits to: lllyasviel
Description
Main model in bnb-nf4
T5xxl in fp8e4m3fn
CLIP-L in fp16
VAE in bf16
FAQ
Comments (129)
please convert FLUX.1-schnell to NF4 Checkpoint
Just uploaded, give it a check.
When will there be an easy NF4 workflow?
I installed the custom node, but have no idea where to place it, how to find it in the node manager, etc.
CheckpointLoaderNF4
@capitalcon I realize that's the name of the node. I do not have a search function when adding nodes. Just a complicated tree with not a single clue where in that tree it's located. Thanks for the response.
@Gradasho Add Node > loaders > CheckpointLoaderNF4
@capitalcon Thank you, now just don't know what folder to place the node file in.
@Gradasho If you're mentioning NF4 Node, it should be placed at custom_nodes folder, inside ComfyUI root. But if you're trying to load workflow, just use Load buttom and selected .json file.
@capitalcon that's the confusing part. I have it placed correctly in the custom_nodes folder. I know how to load workflows, which btw I noticed you mentioned uploading one on another comment, but can't seem to find that workflow on your account here on civit. Also got Cannot import A:\SDXL3\stable-diffusion-webui\extensions\sd-webui-comfyui\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-master module for custom nodes: No module named 'bitsandbytes'
I'm going to throw some civitai points your way or whatever they're called for the help, but dang feels like beginner troubleshooting sorry about this
@Gradasho This error indicates that you're ComfyUI doesn't have bitsandbytes package installed. I believe you're using a portable version and im not sure how to run line commands on its version. The following steps are activate Virtual Enviroment(venv) and install bitsandbytes (pip install -U bitsandbytes).
All python command in comfyui portable should run in folder scripts if python isnot installed (D:\ComfyUI_windows_portable\python_embeded\Scripts -for example). First install pip, the accelerate, then bitsandbytes. At least install custom node from file txt instruction, but run also from folder script and add to path two folder up (pip install -r ..\..\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4\requirements.txt)
@capitalcon @RonnyMackaronny I'm running through A1111
Also tried running the command within Comfy UI says:
[ComfyUI] ERROR: To use this feature, you must either set '--listen' to a local IP and set the security level to 'normal-' or lower, or set the security level to 'middle' or 'weak'. Please contact the administrator.
@Gradasho There should be a file in custom nodes>manager called Config. The settings you need to change are in there for security. There was a reddit thread where I learned that. There's also a line towards to the top that has = true there - change that to False.
@TenaciousDean thank you! Now to figure out what to change the security level to
@TenaciousDean it tells me to use 'middle or below' but doesn't seem to be working
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4
🤝
so "T5xxl in fp8e4m3fn" -> for a 24GB Vram users a fp16 t5xxl would still be better than this? May even fit into 16GB vram?
I have 16GB vram and I can run without problems main model in fp8e4m3fn, T5xxl in fp16, CLIP-L in fp16 and VAE in bf16.
@QuantDweller huh? u can't fit all that in to your VRAM at once... t5xxl is 9GB + flux fp8 is 11GB = 20GB
@tazztone If youre using a NVIDIA then setup system fallback policy so when VRAM is full the system will use also RAM for running checkpoint.
@capitalcon ye but i imagine at big cost of speed? i guess it's a lot faster when both flux and t5 are in VRAM ye?
I've installed everything correctly and it's completely impossible to find the checkpointloadernf4.... node.
If installed correctly it should be at: Add Node > loaders > CheckpointLoaderNF4. Or use ComfyUI Search and look for "CheckpointLoaderNF4"
@capitalcon No, I've got nothing!
Everything is installed, including the bitsandbytes dependencies, the loader is completely missing, I have no idea why :(
@capitalcon No, I've got nothing!
Everything is installed, including the bitsandbytes dependencies, the loader is completely missing, I have no idea why :(
@Yume_ Same thing for me.
@mattrizzi Thank you, it makes me feel a bit better not to be alone
@Yume_ I got it working. All I did was update comfy, restart, do a 'pip install -U bitsandbytes' and 'git clone https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4.git' in the `custom_nodes' folder and then another comfy restart. After that, the CheckpointLoaderNF4 shows up for me.
@kenpm Thank you, that's what I did. The node is not found I have no idea why
I've just uploaded a workflow containing NF4 node, check post files. (zipped file - training data)
@capitalcon thanks for your help bro. I have download it, but i have red node, but I'll say it again, everything is installed correctly, I really don't understand. And I don't have any missing nodes in the manager.
@Yume_ Same thing for me as well.
@Yume_ I checked the console and I saw this: (IMPORT FAILED): C:\ComfyUI\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4, so that's not good. Anyone got any ideas?
@Yume_ comfyanonymous haven't uploaded this node to the manager yet. I believe in your case BitsandBytes package may couldnt be installed properly so without it wont work. Make sure virtual env is activated before run "pip install -U bitsandbytes"
@capitalcon didn't help. please just link to the actual custom node github pages that are needed to get this to work. https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4.git is not enough. it clearly says "Requires installing bitsandbytes" HOW DO YOU DO THAT? I've been searching for 10 minutes.
!!! GUYS !!!
try this:
python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4\requirements.txt
(at the root of your comfy folder)
it's ok for me now !
@Yume_ Working! Thanks! @capitalcon Thank you for the upload and help aswell!
@Yume_ what does this even mean. I tried running both lines in command prompt, python shell, doesn't do anything
For me it was the problem described and solved in this thread:
https://www.reddit.com/r/comfyui/comments/14vjuvi/custom_nodes_import_failed/
When running the command prompt, the bitsandbites addon for phyton was installed in the "app data" folder instead of the comfy phyton_embedded folder, you need to copy them manually (might be in app data roaming instead of local)
i followed your instruction, but the LoadCheckPointNF4 never loads in my ComfyUI??? The first step mentions the site, and i cant install via URL (security issue???)
Navigate to custom_nodes inside your ComfyUI root folder. Use cmd or powershell and git clone node repo. Don't forget to install BitsandBytes too.
BaB is installed, and how do you type in the git clone exactly, as i really dont do it this way
@StarW It seems you're running a comfyui portable version, im not used to that version and how to activate virtual env on it. Give a look at the other comments they're could be useful
figured it out. if you have the portable, sometimes it will save to default (User/AppData/etc...) and not to the location where Comfy is installed. All i had to do was copy over the files to the site-packages in my comfy location from the windows default.
I am gettin "AttributeError: 'ForgeParams4bit' object has no attribute 'quant_storage'" error :( All is up to date
Yeah, it is fixed. Thank you! PS. I use Comfy, not A1111
Really impressed by the speed increase on my RTX 4070!
how much it did increase cause it takes me about 24 sec to generate a 1024x1024 image with my RTX 4070 with the f8 schnell model
@Thr_u It's interesting, works a bit different. 30 steps (768x1344) takes 52s, but each time you change the prompt it takes about 20-30s to analyze it again before generating the image. I'm using comfyui.
edit: 20 steps takes 35s.
Update: The whole prompt issue I had (read my comment above) disappeared when I updated CUDA to the latest version.
Is there a VAE I need to use with this?
Is already baked in the model, including clip.(CLIP-L in fp16
VAE in bf16)
Amazingly cool how well it follows prompts when compared to SDXL.
Can you make a checkpoint with Main model in bnb-nf4, T5xxl in fp16, CLIP-L in fp16 and VAE in bf16?
You can use ComfyUI. Just use the dualCLIP loader node with T5xxl__fp16 and dont use the clip from the NF4 node. The same with vae. Works perfecto
@SDX_Vision Thx for the advice, as a "crutch" option it is more than good, but I would still like it in one file .safetensors.
I cant load any lora with this checkpoint. With the base model it works, but not with this one
Same here. I hope it's not a fundamental incompatibility.
It doesn't work with Loras :(
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4/issues/10
Loras are not compatible with nf4 (yet, at least)
I don't know what I broke but I've had to reinstall A1111 4 times in the last 8 hours trying to get NF4 to work and consequently breaking my installs each time. I even tried getting the workflow, which shows a missing node, but then it won't find the missing node or update it. I can't install the custom node manually using any of the ComfyUI Manager options (from .git URL, pip package, etc) because of some claimed security issue which attempted fixes caused 2 of my 4 broken A1111 installations, and the NF4 node doesn't appear in the standard list of custom nodes. This seems like it would be really cool, but 5 hours of my day off is now gone to no end.
NF4-BnB Node is now available at ComfyUI Manager, but you need to select 'Channel: dev' in order to find it.
@capitalcon ty. Any idea how to fix this?
ComfyUI_bitsandbytes_NF4 [EXPERIMENTAL] install failed: With the current security level configuration, only custom nodes from the "default channel" can be installed
@Gradasho Open the ComfyUI Manager, in the left menu, 2nd from the top says "channel:default", click on it and select 'dev'.
i was trying the fp8 version on my 4gb vram and it did works with 512x512 resolution - 15 steps, 430 seconds with unet loader workflow.
idk why when trying this workflow with nf4 model, it takes way much much longer. 30 minutes?
is that nf will work faster with old cpus ?
old gpu yes, cpu no idea lul
cpus? you got balls for even thinking about running this on a cpu let alone an old cpu
An older version of ComfyUI runs at the same speed using the regular dev model as this one does. The newer updated ComfyUI, required to run this, is slower with the normal model than previous speed with it (4 to 5 seconds on 4090 with 20 steps), so the difference is there but it's in comparison to an apparently less efficient ComfyUI version (update).
I don't see a reason for this to exist.
You aren't entirely wrong, but it's also a different format of model altogether. It has multiple levels of precision including FP32 and FP16 all in one, plus it has clip and vae merged and still is smaller than the other model.
lllyasviel pushes the world forward again.
My measurements for 30 steps in 1024 x 1024 are 2 minutes. 3060-12 and 32 ram
NF4+t5xxl_fp8 will not give you any speed boost compared to FP8+t5xxl_fp16 if your videocard has 16+GB VRAM. Tested in ComfyUI.
upd. If you don't have enough RAM or the checkpoint is on the HDD, you will get a speed boost, but this has nothing to do with the performance of your GPU!
upd. I test pytorch version: 2.4.0+cu121 & pytorch version: 2.4.0+cu124, results below. Hardware: RTX4070TI Super (16gb), i7 12700, 64GB RAM cl16 3600 d.rank, nvme 7000mbps.
cu124, fp8 and fp16 t5xxl.
6 seconds first launch.
26 sec - 1024x1024, 20 steps, euler simple.
cu124, nf4 and fp8 t5xxl.
3 seconds first launch.
26 sec - 1024x1024, 20 steps, euler simple.
cu121, fp8 and fp16 t5xxl.
27 sec first launch.
28 sec - 1024x1024, 20 steps, euler simple.
cu121, nf4 and fp8 t5xxl.
9 sec first launch.
28 sec - 1024x1024, 20 steps, euler simple.
Dramatic improvements for me on my 4080
@Tophness 4070ti Super. 47 seconds - 960x1440, 25 steps. No difference at all, identical result, why do you have a difference, but I don't, good question.
@QuantDweller might be the low bandwidth vram on the 4070ti
hmmm NF4 was 3 seconds faster (1024x1024, 20 steps, euler simple), on a 3090
Not that much gain... not sure about higher resolutions
@rupigo786 Thank you for sharing the testing, it is interesting that there is a difference with 3090. But in general NF4 doesn't support LORA yet and its quality is worse, I will prefer FP8 for now, especially since my results are completely identical, unlike yours.
Drastic improvements for me on my 4080. Took minutes to load everything on the 1st run before, now it's seconds.
Where did you test? In ComfiUI? How much RAM do you have? Is the parameter HighVRAM set when loading? For me everything loads quickly on the first boot, this can happen if your checkpoint is on a HDD or low-speed SSD.
@QuantDweller Comfy, DDR5 5200mhz 32gb ram, highvram=false, checkpoint is on my gen3 x4 NVMe m.2 drive. I'm using torch 2.4 / cuda 12.4 and I literally just switch to this model and it loads and generates much faster. Lots of other people saying the same thing
@Tophness My pytorch version: 2.4.0+cu121. Can you tell me the resolution, sampler, number of steps, time or generation speed? I will test it myself. fp8 and fp16 t5xxl loads in 27 seconds at first launch and NF4 in 9 seconds for me. I have 64GB dual-rank memory CL16 3600 MHz amd NVME 7000 Mbps/s. The first run of NF4 is faster than FP8 only due to its size, 11GB vs 21GB.
upd. I test pytorch version: 2.4.0+cu121 & pytorch version: 2.4.0+cu124, results below.
cu124, fp8 and fp16 t5xxl.
6 seconds first launch.
26 sec - 1024x1024, 20 steps, euler simple.
cu124, nf4 and fp8 t5xxl.
3 seconds first launch.
26 sec - 1024x1024, 20 steps, euler simple.
cu121, fp8 and fp16 t5xxl.
27 sec first launch.
28 sec - 1024x1024, 20 steps, euler simple.
cu121, nf4 and fp8 t5xxl.
9 sec first launch.
28 sec - 1024x1024, 20 steps, euler simple.
I'm using ComfyUI through Stability Matrix, I have no idea how and where to install bitsandbytes. Can anyone help me?
u need to go in dev branch in comfy manager (left side options). After install you can switch back to normal
Can it work with lora?
The main developer for ComfyUI says no. Apparently NF4 is not compatible with Lora.
no
This is now my go to model for Flux. Astonishingly good.
The little engine that could right here boys and girls
Worked fine last night... tried today and :
Error occurred when executing CheckpointLoaderNF4: Expecting value: line 1 column 1 (char 0)
the instructions make it seem like you can install/upgrade bitsandbytes from the os layer. and only the next step through the pythonembed directory. this is incorrect. if you need to activate the pythonembed for one... you need to do for both. C:\AI\Comfyui\python_embeded\python.exe -s -m pip install -U bitsandbytes
Some users had ComfyUI manual installed instead of Portable Version. Thats why is necessary to install some packages to venv via command line. I'm not sure about how it works on PE Version.
People say, you have to go NF4, I download, click run, million errors. I've been a programmer and those instructions are total jibberish for me lol 😂
If you're getting a million errors, maybe it's time to sharpen those debugging skills—especially since you're a programmer. Figuring it out is part of the job!
I got it working lol
well that took 20 mins.... nice and fast results! 👌💕
@capitalcon and I was(!) a programmer, in PHP even before frameworks existed, but it works, so great! 😁
@JayNL how did you fix it i'm still struggling, and are you on comfyUI ?
@albihany yes comfy local, you actually only have to do the git thing (1.) and the last phrase (2. at the end) in the main directory
After spending last of my internet quota to download this model i ended up with the error that EVERYONE HAS, don't tell me to use the -s -m pip install -U bitsandbytes command because ITS NOT WORKING even after 1000 times, it's really frustrating to issue something with full of BUGS https://ibb.co/QPB9jJP
That's a problem with your BnB installation. ComfyUI Portable Version uses an embedded Python environment, so you'll need to run pip through there. Use the following command::
[comfy install dir]\python_embedded\python -m pip install -U bitsandbytes
After that, restart ComfyUI.
make sure you have Comfyui-Manager installed (goto ComfyUI/custom_nodes dir in terminal(cmd)
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
Go to drive comfy installed eg C:\ComfyUI\ComfyUI\custom_nodes\ComfyUI-Manager open file config.ini in text editor and add "Security_level = weak" after the other text without quotes. Start ComfyUI open manager click install via git url and put this address in https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4.git
once installed update all. restart. and everything should work
@capitalcon thank you from your comment and as a matter of fact i did the update command at the right directory you mentioned (python_embedded) i'm also a python programmer so something like that isn'tt hard for me to figure out i did the update first time and whenever i try again i get tthe messages those files are already existed so that's not the issue here
@budhuw thank you for your response, i do have ComfyUI-Manager indeed that's how i update and download missing nodes all the time, i also updated my comfyUI and everything before installing, as for the custom node i git clone it instead custom_node directory instead of downloading it through the manager you think that could cause the issue i assume either way is relevent ? https://ibb.co/PwXBz3N
@budhuw i just did test the other way of installing through the managers so i deleted the custom node and install it again through the managers and the same annoying error message still pops up, The model is working fine on forge webui, the issue is only with this custom node
@budhuw i love you
what model are you refering to with this:
flux1-dev-fp8 ?
this here?
https://huggingface.co/Kijai/flux-fp8
The repo is clear I think.
NOTE: This is very likely Deprecated in favor of GGUF which seems to give better results: https://github.com/city96/ComfyUI-GGUF
not seeing any performance improvements in gguf
Did you find it faster?
@angelino00786 Didnt want to FF up my confy. But on forge I did not notice a big difference.
on my 1080ti gguf this is the worst thing that happened with SD, so everything is subjective...
I like it runs very well and fast but I found v2 to be unstable locking up after 5 gens. I went back to v1 works fine
Can this model be used with Forged?
yes, you need update forge
Can I use if free to generate images for commercial AI influencer's Instagram?
Great model! Just a quick note that it doesn't seem to work with LORAs in ComfyUI right now
i recommend gguf q4.0 you can use multiple loras on it without breaking your vram, by default it uses the same amount of vram as nf4 v1 which is nice. aesthetic and composition wise it's much closer to fp16 . the only downside is that its inference speed is around 15-20% slower than nf4 at least on my 4060
How to use with diffusers? Thanks
from diffusers.models.transformers.transformer_flux import FluxTransformer2DModel
pipe = FluxPipeline(...)
pipe.transformer = FluxTransformer2DModel.from_single_file('flux1SchnellBnbNf4.HcL1.safetensors')
Which Sampling Method and Schedule Type are recommended?
This model is fantastic! It performs smoothly even with my 6GB VRAM setup. I’m using it with the latest Forge version and these settings: UI: Flux, VAE/Text Encoder: None, Diffusion in Low Bits: bnb-nf4, Swap Method: Queue, Swap Location: CPU, GPU Weights: 5119 (around 90%), Sampler: Euler, Schedule Type: Simple, Sampling Steps: 4. Everything works great!
It's a lot faster, about 4x faster, but any tips on getting faces that do not look airbrushed? I am struggling to get skin to look like skin and not rubber. [Edit: My bad, I had downloaded Schnell and not Dev. Corrected that. My goodness, the difference in image quality between schnell and dev in this version is night and day]
Changing to DEIS + DIMM (Sampling+ Schedule type) should give much better skin look (some ppl say you might need to also lower cfg to 2.5, but I didn't have to)
quick question please guys. i have a rtx 2050 4gb vram and 32gb ram, what flux model should i download and use, i'm kinda lost into all these models, and ive tried a bunch of them, and got a long time generation. thank you to anyone of yoy who will answer back
Everything works perfectly! On par with the original model - generation time on my ancient system is 1 min 40 sec.
i searched the hash "e6cba6afca" in civitiai and it gave me two models .
one from Ralfinger and one from Capitalcon
So which one should i download ?
I get an error message "mat1 and mat2 shapes cannot be multiplied". I use the original workflows from the pictures, pity
me too...
same, installed the git official says the user of the model and the model dont work, i have this error. I ask the IA for solution and no have, it says the clip are bad on the model. good luck.
Details
Files
nf4Flux1_nf4Bnb.safetensors
Mirrors
flux1DevHyperNF4Flux1DevBNB_flux1DevBNBNF4V1.safetensors
nf4Flux1_nf4Bnb.safetensors
pixelforge.safetensors
flux1-dev-bnb-nf4.safetensors
flux1-dev-bnb-nf4.safetensors
flux1DevSchnellBNB_flux1DevBNBNF4.safetensors
flux1-dev-bnb-nf4.safetensors
flux1-dev-bnb-nf4.safetensors
flux1-dev-bnb-nf4.safetensors
flux1-dev-bnb-nf4.safetensors
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.




