A collection of 3 workflows forked from 3 other sources and reworked to allow them to do more with less VRAM or RAM. Prevents using the RAM while staying on the VRAM, prevents ComfyUI crashes (you will less likely saturate the RAM), allows you to use bigger models on the same graphic card, with better settings.
The cost? The WAN 2.2 models will have to be loaded each time into memory, which is time consuming, but it is better than nothing. Loading the model from an SSD drive is recommended. Please check your extra_model_paths.yaml file in the main directory of ComfyUI.
I made several typos in the workflow's instructions. It's tedious to update the entire workflow file here (I'd have to submit the Nth "example.png turns into a mighty Phoenix"), so I am posting the corrected version below. Apologies for the inconvenience.
- Preparation:
Make sure you have all the needed files by enabling everything. If you are not using an SSD to store your files, you're likely wasting a lot of time.
If you run Part 1 and Part 2 separately, in this order, several times, the models will be loaded into memory over and over. It reduces the overall amount of RAM or VRAM needed, but the process is time-consuming. This method is useful when the alternative is a complete workflow failure, such as a crash near the end. I spend an additional 1 minute and 20 seconds per generation with this method, but I also save time by keeping everything in VRAM. For Wan 2.2 FLF2V, this is competitive; for Wan 2.2 T2V, it is much less so.
- First Part:
Select "Enable Part 1" and disable "Enable Part 2." Apply the usual settings. Run the workflow. It will only process the High CFG part. Once the first part of the workflow is finished, the .latent file will be saved to a cache directory. You will probably need to adapt the save path for this file.
- Second Part:
Select "Enable Part 2" and disable "Enable Part 1." Make sure the path to the latent file is correct. Run the workflow.
Original workflows:
t2v:
https://civarchive.com/models/1835262/wan22-t2v-gguf-lightx2v-lora
I2v:
https://civarchive.com/models/1822764/wan-22-i2v-gguf-compact-speed-wf-or-lightning-lora-44-steps
(natively very good)
FLF2V:
(this ís my own model, it is now very far from the original that is linked)
Description
It is for Wan Video 2.2 I2V GGUF and Wan video 2.2 T2V GGUF
FAQ
Comments (4)
There are a lot of fallacies revolving around using 'limited' VRAM for video generation. The ultimate solution would be streaming direct from a 5GB/s NVME across the PCIe bus per iteration, using minimal VRAM and RAM. Sadly Comfy would be needed to be coded for this (instead Comfy has the worst memory management code imaginable).
You see Windows automatically caches file reads UNLESS programmatically prevented from doing so. Hence the NVME model ends up taking major space in RAM even if you are streaming from flash each iteration.
Comfy is literally optimised for LLM use (the only time you want the model in VRAM if possible) even tho Comfy is almost never used for LLMs.
What Comfy needs is explicit per model flags allowing tagged models to stream from NVME per iteration leaving VRAM and RAM mostly alone. Per model so the LLM clip model can load to VRAM (before removal) to process the prompts quickly.
I created a workflow and thats a bit too technical for me, but I am a bit puzzled with the way comfyui deals with the memory. I used Ubuntu, btw... I am also a Windows user but not for this one...
this WAN workflow has been the only one ive been able to figure out and actually works. thank you!
Thank you.