Wan2.1 NF4 quantizations - CivArchive (CivitAI Archive)

Wan2.1 NF4 quantizations - i2v 14B 720p nf4

NSFW

Update: I'm in the process of doing a test render with my comfy workflow, which I'll post shortly. It may be able to handle 81 frames at 720P on a 4090. Use this ComfyUI module to load the checkpoint: https://github.com/silveroxides/ComfyUI_bnb_nf4_fp4_Loaders

These are NF4 Quantizations of the Wan video generation AI. They work really well.

Description

FAQ

Comments (55)

funscripter627Feb 27, 2025

CivitAI

Thank you!

magpyFeb 27, 2025

CivitAI

Thanks. Will these work in ComfyUI and if not, any plans to submit a PR?

hiben40387Feb 27, 2025

It seems to work with this node, https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4

Ohio_GodFeb 27, 2025

@hiben40387 not working - ERROR: Could not detect model type of: C:\models\checkpoints\wan21NF4_i2v14B480pNf4.safetensors

hiben40387Feb 27, 2025· 1 reaction

@ai_wifus That is strange it worked for me. city96 just released the GGUF you can give those a try should be more easily compatible

https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf

Ohio_GodFeb 27, 2025

@hiben40387 This custom node is working, but when I run the workflow, I get an error saying some tensors are on the CPU while others are on the GPU. BTW my GPU has 12GB VRAM. Is there a way to fix this?

hiben40387Feb 27, 2025

@ai_wifus Reading through that github it seems it's a memory leak error there is a someone made a fork to fix it that being said, for 12GB you are better off using the gguf Q4 from the link I gave, use it with these rwo nodes. These two together considerably cut down vram needed.

https://github.com/pollockjj/ComfyUI-MultiGPU

https://github.com/city96/ComfyUI-GGUF

Ohio_GodFeb 27, 2025

@hiben40387 thanks for the reply! currently im using the q5 GGUF it works great!

creedukMar 15, 2025

@hiben40387 I have 12gb as well, ran into same issue can use the --gpu-only argument and I read about the other node fixing mem leak (hopefully other nodes also fix).
I have been testing full, vs gguf vs nf4. I assume nf4 is easier on memory than the full though right? GGUF is nice but slow and I found I can run full 480 models or 720 with 480 latent size. Or is GGUF and Nf4 about same speed wise?

I am still getting round to this test I assume it is a unet I know some nf4 like flux ended up being checkpoint loads

NorfolkDaveFeb 27, 2025

CivitAI

Does this speed up the workflows (im on a 3090 so usually vram isnt an issue lol.... usually)
and, forgive this if its a bit of a noob question... but where do we put them and how do we load them? guessing they wont load with the default checkpoint loader node?

_Envy_

Author

Feb 28, 2025· 1 reaction

You should be able to render more frames due to the memory savings. FP8 may actually be faster.

GarlandMar 9, 2025

I had used this NF4 model with the node mentioned and I get nearly the same Vram usage(about 21-22G) and speed with official fp8 scaled model, I dont understand why but I guess I won't use this anymore.

_Envy_

Author

Mar 9, 2025· 1 reaction

@Garland It doesn't cut usage in half, but I definitely do better than that with it. That being said, if it doesn't help you fit onto your card's VRAM, the FP8 model is faster and the quality is marginally better. The only reason to use this is to save VRAM.

mckennaFeb 27, 2025

CivitAI

[enforce fail at alloc_cpu.cpp:115] data. DefaultCPUAllocator: not enough memory: you tried to allocate 362387865600 bytes. i am getting this on a 3090, any ideas?

_Envy_

Author

Feb 27, 2025

What are you using?

_Envy_

Author

Feb 27, 2025· 1 reaction

If you're trying to generate 81 frames on 24GB at 720P, you'll get OOM. Try reducing the resolution to 480P or doing 41 frames.

gachaFeb 28, 2025

I'm also getting the same error message. I'm using the Swarm UI, and I have an RTX 4090. I downloaded the 480p version. It's not working regardless of the frame rate. Can you provide a solution?

mearyuMar 1, 2025· 4 reactions

You get this error if you use the Comfy UNet loader node (which would be in existing workflows), you need to use the one called "Load FP4 or NF4 Quantized Checkpoint Mode" from https://github.com/silveroxides/ComfyUI_bnb_nf4_fp4_Loaders/blob/master/__init__.py#L178C16-L178C58

boommegajay456Feb 27, 2025· 13 reactions

CivitAI

For all those frustrated with the poor description — you'll need these nodes to load the nf4 model

https://github.com/silveroxides/ComfyUI_bnb_nf4_fp4_Loaders

Ohio_GodFeb 27, 2025

This custom node is working, but when I run the workflow, I get an error saying some tensors are on the CPU while others are on the GPU. BTW my GPU has 12GB VRAM. Is there a way to fix this?

_Envy_

Author

Feb 28, 2025

@ai_wifus Here's my workflow. Check your resolution. I should be 1280x720 or 832x480 (or the portrait version of those). When mine was bigger than that, I got that error.

SnapMindFeb 27, 2025· 3 reactions

CivitAI

It would be great, if you could share an example workflow for comfyUI.
Thanks ;)

boommegajay456Feb 27, 2025· 2 reactions

https://blog.comfy.org/p/wan21-video-model-native-support

SnapMindFeb 27, 2025· 3 reactions

I just found example workflows: https://github.com/comfyanonymous/ComfyUI_examples/tree/master/wan

_Envy_

Author

Feb 27, 2025· 2 reactions

I'll be posting mine shortly.

_Envy_

Author

Feb 28, 2025· 2 reactions

https://civitai.com/models/1303181?modelVersionId=1470961

dims2Feb 27, 2025· 9 reactions

CivitAI

Great! Both 720p and 480p i2v works nice on 4090(24GB vram), with fp8_e4m3fn_scaled text encoder. (fp16 TE seems to require more VRAM when using 720p).
I used these example workflows, just changed UNET loader node to nf4 one.

_Envy_

Author

Feb 27, 2025

Yeah, I've been doing 41 frames at 720p. 81 is too many even for my 4090.

_Envy_

Author

Feb 28, 2025

Update: I can do 81 frames at 1152x640. I hadn't tried it because it wasn't one of the "official" recommended resolutions, but it works great.

Noob_eeMar 12, 2025

is the NF4 node available in the comfyui manager install custom nodes category in the comfyui manager?

creedukMar 23, 2025

@dims2 Just to check as I have been considering getting 720 even to make 480 videos, doe sit use more vram via text encoder for same size video, example teh sample 512 x 512 run both on 480 and 720 wil 720 eat more memory?

eren001Feb 28, 2025· 1 reaction

CivitAI

which base model should I use? and also how to run it code without comfy

89184840795s492Mar 3, 2025

CivitAI

The 720 version doesn't work for me either. I haven't tried 480.

voidyearMar 6, 2025

CivitAI

The wanimagetovideo node is missing, may I ask which plugin this node is in?

mfiresonMar 8, 2025

Custom node don't create new node, it replace exists node and rename it to "Load FP4 or NF4 Quantized Diffusion or UNET Model"

Noob_eeMar 12, 2025

@mfireson ive followed the instructions about choosing channel : dev and then look "CheckpointLoaderNF4" and the file pops up in the "install custom nodes" when I hit install, it says that the it will not install because of the level of security parameters. so I am almost there but that node loader is not attainable. I will find a work around on installing that node because it is the final step i need to run this workflow. However, if you have info that is worth a shot, please feel free to share and lend your guidance. I was thinking of just changing the "CheckpointLoaderNF4" node to the diffusion/unet loader and see if that is viable as awork around.

as for now, I am stuck.

Noob_eeMar 12, 2025

i fix the node but the rest of the workflow has so many bugs.

ThermaMar 20, 2025· 1 reaction

Here. https://github.com/kijai/ComfyUI-WanVideoWrapper

If a few nodes says they're missing, do these:

Pip uninstall torch -y

Pip uninstall torchvision -y

pip uninstall torchaudio -y

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

pip3 install torch==2.7.0.dev20250123+cu126 --index-url https://download.pytorch.org/whl/nightly/cu126

Noob_eeMar 25, 2025

@Therma i think i found the issue, i was using an older comfy ui file vs the stand alone portable version. ill report back when i have tried this version.

Noob_eeMar 12, 2025

CivitAI

this workflow has so many bugs. what is sage attention?

RuggedPineappleMar 17, 2025

Sage Attention is a method thats been super popular for video models for a few months, usually paired with teacache. Sacrifices a small amount of quality for a substantial speed up in generation.

Noob_eeMar 18, 2025

@RuggedPineapple ah. these add ons make it difficult to utilize the workflow.

RuggedPineappleMar 18, 2025

@Noob_ee I dont run this workflow, i just saw it scrolling through, but as general help I currently have 3 nodes that implement Sage Attention. One from Flux-Lightning, one from hunyuanvideowrapper and one from kjnodes that you have to turn on the see beta option for it to show up in your node picker. Also, sage requires Triton which is a royal pain in the ass to install on windows but a simple pip install command on linux, so if youre on windows the juice may not be worth the squeeze.

That said on 40xx series hardware taking the attention functions down to 8 bit with sage bumps generation speed 150% to 180% so its kinda magic

Noob_eeMar 18, 2025

@RuggedPineapple right, i am not blessed running linux like others on here who seem to not mention what OS some of these images are using. the juice is not worth the squeeze, i was over here thinking that i was doing something wrong. i am running a 40series but shy of the 4090. ill just have to wait for a better optimization for windows.

voboysoApr 23, 2025

@Noob_ee WSL 2

Noob_eeApr 24, 2025· 1 reaction

@voboyso what is that?

voboysoApr 26, 2025

@Noob_ee Windows Subsystem for Linux. You can use SageAttention easily while still running Windows (you'll do the generations and all that in WSL (Ubuntu Linux)). Just search "WSL SageAttention" for more info, but ere's the video I followed: https://www.youtube.com/watch?v=ZBgfRlzZ7cw. I had several issues after that I needed to address; might go smoother for others though. The guy is a bit all over the place, but you'll be on your way to faster generations by following it (and possibly figuring out some other issues after, on your own).

markdaliasMar 28, 2025· 3 reactions

CivitAI

Works on my RTX 5070. Although I might be misunderstanding the point because the speed is the same as the fp8 version and I don’t really notice a change in vram usage either?

STRWHEREENov 12, 2025

Make it Make sense

VolkinApr 1, 2025· 2 reactions

CivitAI

So I thought this NF4 would have speed advantage on the new Blackwell gpu's. I guess it's only valid for the FP4 and not the NF4.

This NF4 has no speed difference over FP8 or FP16 and it's even more difficult to run due to the BNB node's system inability to offload the model to system ram.

On my RTX 5080 16GB, I can run the 720p FP16 model just fine in 1280 x 720 (81 frames) by offloading up to 50GB model data into system RAM without any performance degradation, and yet I can't even do 960 x 544 on the NF4. Bits and Bytes needs to work on that offloading I guess.

With the current state of Comfy nodes, if you want to save on VRAM it's best to just use the FP8 and Q8 quants because they offload much better on low vram gpu's. if you use torch compile with Wan2.1 on the native official workflow and with 64GB system ram, it will allow you to offload the FP16 model and will even make it faster.

Thank you for your work with this NF4 version anyway.

thaddeuskApr 13, 2025· 1 reaction

I made an FP4 quantized version and it's still slower than a Q6 GGUF on my 5070 Ti. I dunno if there's anything special that needs to be updated to get the improved speed from the Blackwell FP4 optimizations.

thaddeuskApr 14, 2025· 2 reactions

Seems like it specifically needs to be NVFP4 precision to get the speed advantage. MIT recently released an SVDQuant method that supports NVFP4, currently just for Flux, but they're planning to add Wan2.1 support. I tested the flux workflow in ComfyUI with the FP4 model and 8 step lora and it's blazingly fast. I can generate a 1920x1200 image in less than 10 seconds, 1024x1024 in under four.

VolkinApr 16, 2025

@thaddeusk Amazing! Thank you for the information, much appreciated!

therodjerkit249Apr 9, 2025

CivitAI

При загрузке модели я получаю ошибку, All input tensors need to be on the same GPU, but found some tensors to not be on a GPU: [(torch.Size([1, 13107200]), device(type='cuda', index=0)), (torch.Size([409600]), device(type='cpu')), (torch.Size([5120, 5120]), device(type='cuda', index=0))]

Использую RTX 4070 на 12GB VRAM, использую модель T2V

VoronaDragonApr 9, 2026

CivitAI

Any chance to use this on WebUI forge?

Checkpoint

Wan Video

by _Envy_

Download (Beta) View on CivitAI