Wan2.1 I2V GGUF w/ Optimization Nodes + Causvid/FusionX/LightX Workflow

Wan2.1 I2V GGUF w/ Optimization Nodes + Causvid/FusionX/LightX Workflow - v1.0

NSFW

No more OOM errors here with 16gbvram. Use this workflow as a reference to incorporate these optimization nodes to your own workflow or use it yourself if you really want to.

I modified definitelynotadog's version 1 workflow to optimize for 16gbvram (I'm sure lower can work too) so shoutout to him for making the main thing - https://civarchive.com/models/1622023?modelVersionId=1835720

I added SageAttention, BlockSwap and TeaCache nodes. Around 3 minutes using the 480p GGUF model. Around 6 minutes using 720p GGUF. Feel free to yoink the optimization nodes I added and add it to his V2 workflow or to your own workflow even.

Firstly, Download a GGUF version of Wan2.1 model of your choice for better performance. higher the Q number = higher quality. I personally went with Q3 bc it is faster than Q8 and the difference in quality was very minimal from what I could tell.

LoRa Accelerators:

The Causvid LORA is replaceable with FusionX or LightX lora because it's built-in apparently so choose one or the other.

SelfForcing - https://civarchive.com/models/1713337/wan-self-forcing-rank-16-accelerator

FusionX - https://civarchive.com/models/1678575/wan21fusionx-the-lora

CausVid - https://civarchive.com/models/1585622/self-forcing-causvid-accvid-lora-massive-speed-up-for-wan21-made-by-kijai

Models:

Here are I2V 720p gguf models - https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf/tree/main

I2V 480p gguf models model - https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main

Make sure to also download the missing VAE, clip vision, and clip

https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors

https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors

https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

Make sure you have Custom Manager to install Custom Nodes and dependencies but this specific one you might have to install manually: https://github.com/city96/ComfyUI-GGUF

Settings:

Make sure you're running 4-10 steps when using fusionx lora or causvid lora, stick to 16fps, 81 length max. 7 steps is the sweet spot for me, anything below I get a lot of artifacts/glitches. 848 for 480p and 1280 for 720p

OPTIMIZATIONS:

Sage gives ~25% more speed. If you don't have SageAttention installed on your system. Here's a guide for sage on windows, you might bork your comfyui if done incorrectly: https://www.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/

BlockSwap Node, it offloads VRAM to system RAM (NO MORE OOM ERRORS)Set to 40 for 14B model, 30 for 1.3B model. Look at vram usage while generating to find the sweetspot; the higher the blocks, the less vram usage. Keep in mind resolution effects VRAM by a lot.

TeaCache Node gives 2X speed but at the cost of QUALITY. If video is too fuzzy or you are missing limbs, set the thres lower to 0.140 or feel free to bypass/delete this node if you don't like the results. I'm still tweaking it to find the best settings for it.

I'm still new to these nodes so feel free provide any useful info you may know to help others.

Description

FAQ

Comments (12)

intcpd250Jun 26, 2025· 2 reactions

CivitAI

👉 “higher the Q number = higher quality” is totally wrong. It’s like saying JPEG at quality 10 looks better than quality 95.

In GGUF (and quantized models in general), higher Q values mean more aggressive quantization, which reduces model size at the cost of quality. 👀

bhopping

Author

Jun 27, 2025· 4 reactions

Don't you mean "Lower Q" values mean more aggressive quantization, which reduces model size at the cost of quality? I tested the lowest Q model (wan2.1-i2v-14b-480p-q3_k_m 8.3GB file size) and it was faster at generating than the higher Q (Q8_0.) Q8's a larger model size of 17.7 GB. However the difference in quality is actually subtle so the lowest Q could be the play for the better performance so ig we were both wrong here? But based on I read online, it sounded higher the Q number, the higher fidelity so idk what to think anymore lol

xnapxJun 27, 2025· 3 reactions

@bhopping OP is correct, the commenter got it backwards.

LoneWoolfManJul 1, 2025· 2 reactions

OP, you're mistaken, examples.

wan2.1-i2v-14b-480p-Q3_K_S.gguf → 7.38 GB
wan2.1-i2v-14b-480p-Q5_K_M.gguf → 11.8 GB
wan2.1-i2v-14b-480p-Q8_0.gguf → 16.8 GB

Q3_K_S = 3-bit quantization → smaller, lower quality
Q5_K_M = 5-bit → better quality, moderate size
Q8_0 = 8-bit → best quality, closest to original model

So yes, in GGUF, the rule is:
Higher Q number = higher quality
Lower Q number = more compression = more quality loss

“Higher Q number = better quality” (GGUF context) → True
“Higher Q number = worse quality” (what OP implied) → False
“Q3 is smaller because it’s better” → False
“Q3 is smaller because it’s more compressed and lower quality” → True

Also:
K_S = "Small" → more aggressive compression, lower fidelity
K_M = "Medium" → better balance of size and quality

If you're aiming for visual or generative quality, always choose:
Q5_K_M over Q5_K_S
Or just go with Q8_0 or full F16 if VRAM allows

bhopping

Author

Jul 1, 2025· 1 reaction

@LoneWoolfMan Thanks for the clarification! I deleted “higher the Q number = higher quality,” from my desc in case if I was wrong but I'll put it back now. I agree that higher Q numbers like Q8_0 mean higher quality and larger size, while lower Q numbers like Q3_K_S mean more compression and lower quality. I was correcting the other person who thought higher Q numbers meant lower quality, like mixing up JPEG quality 95 (better) with 10 (worse). Sorry if my ‘we were both wrong’ made it more confusing.

My point was Q3_K_M was faster on my 16GB VRAM setup because Q8_0 likely needs offloading, which slows it down. Though Q8_0’s quality is technically better, the difference was subtle in my tests, so I chose Q3_K_M for better performance : )

intcpd250Jul 22, 2025

bhopping Yeah, I get what you're saying, but I think we're talking past each other a bit. You're right in practice that lower Q values (like Q3) result in smaller models and faster generation—that's because they use more aggressive quantization. But when I said "higher quant," I meant in the compression sense—like higher quantization levels = more compression = lower precision (e.g., FP8 vs. FP32), so you're technically going backwards in fidelity.

So yeah, Q3 = more quantized = lower quality, while Q8 = less quantized = higher quality, even though the "Q" number looks higher. It's kind of a naming flip—higher Q number = less quantization. The quality difference can be subtle though, like you said, especially if the model was trained/tuned well.

So we're kinda both right, just looking at it from different ang les.

pleasenonononoJul 2, 2025

CivitAI

I using the torch node, I get like a hundreds or messages of "39684 Lib\site-packages\torch\_inductor\utils.py:2068] [0/0] DeviceCopy in input program" and the generation never starts. If I disable it it goes fine but someone using the 720p model with a 5090 gets it also stuck.. other workflows I've tried work fine. So I wonder 1) how to fix this 2) if torch nodes is needed at all or some other settings are needed.

Thanks!

bhopping

Author

Jul 3, 2025

Those 3 red nodes (PatchSage, PatchModel, and TorchCompile)are for ppl that have sageattention setup on their system. So after you bypass them, your 720p model doesn't load? Which 720p model are you using, is it a GGUF version or the full base one? What resolution are you doing? I sometimes have to start at a low res for the first generation for the model to load properly, you can try that. I noticed this workflow is pretty heavy for some reason too compared to other, maybe do too the samplers but i honestly don't know yet.

Another option you could do since your other workflow worked, try copying the BlockSwap and TeaCache node and paste it in your other workflow and use this workflow as a reference to see where to put them on your old workflow and it should optimize your old workflow : )

hazzoom82659Jul 3, 2025· 1 reaction

CivitAI

I am glad you shared such a workflow, I also have a 16 GB VRAM card RTX 4080, but I also have used the GGUFs to shift some Loras and Clip models to another installed GFX card (GTX 1060 6 GB VRAM) & use the GGUF's custom nodes that shift memory usage to the System RAM (as I have 128 GB RAM), I don't mind using all the juice I have in my PC . That trick of using the Block Swap node really helps with a model big enough to cause slowdown or memory issue, I have Sage Attention, Flash Attention, Triton (was hard to install all of them together but I got some honey after some stings), the rest of your tips in the description are very welcome, something to make us really get something beautiful easily out of Wan with our PCs.

bhopping

Author

Jul 4, 2025· 1 reaction

Np, yeah it took me a couple tries installing sage on windows but we don't talk about that lol

jackdrez91Jul 7, 2025· 1 reaction

CivitAI

Thanks for the workflow, dude. I struggled a bit with installation. Please include two missed dependencies into your description: 1) ComfyUI-GGUF (a requirement for ComfyUI-MultiGPU, didn't listed in Comfy Manager, didn't installed automatically, in my case) 2) CausVid_14B_T2V_lora_rank32

bhopping

Author

Jul 9, 2025

Nice catch, I totally forgot that node was an issue with the manager. I updated the desc TY

Workflows

Wan Video 14B i2v 720p

by bhopping

Looks like we don't have an active mirror for this file right now.

CivArchive is a community-maintained index — we catalog mirrors that volunteers upload to HuggingFace, torrents, and other public hosts. Looks like no one has uploaded a copy of this file yet.

Some files do get recovered over time through contributions. If you're looking for this one, feel free to ask in Discord, or help preserve it if you have a copy.