Now faster and easier to install
This workflow uses a small baseline generation using the 14B image to video model, followed by upscaling, and then smoothing out the result using the 5B model.
This lets you test prompts and iterate quicker on the base generation before upscaling to a final resolution.
Links for all the required models and where to put them are now included in the workflow.
FAQ
Do I need both Wan 2.1 and 2.2 VAEs?
Yes. The 2.2 VAE only works with the 5b model (confusing, I know). Make sure the main section loads the 2.1 VAE, and the upscale section loads the 2.2 VAE.
Its frozen on VAE decode
The second vae decode can take a long time. Just be patient.
Description
Add working unload model implementation
Merge triton and non-triton version with simple mutable group
FAQ
Comments (102)
Have you thought about connecting CausVid lora here to speed things up?
Do you know of any good WAN workflows where CauseVid is being used?
I've tested CauseVid, and it does definitely speed things up by a lot when using very low step counts. Because there is a loss in quality, I probably won't include CauseVid by default. But if you want to use it yourself its very easy to do. You just bypass the TeaCache node, and then set the CFG to 1.
I am currently experimenting with a Split Sigma's workflow that might reduce the quality reduction from CauseVid. That will likely be in the 1.4 update.
need help/take off torch compile/ error / FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\PROFES~1\\AppData\\Local\\Temp\\latentsync_3aca1a49\\latentsync_e7f32a9c\\torchinductor_Professional\\triton\\0\\1eeb6dc8e24911e12ed78ffb1fa0bde71466f1ae8e4906fed658bc9a89e66db4\\triton_red_fused_add_mul_native_layer_norm_0.ttir.tmp.pid_26620_ac9c8554-33af-4470-ad3f-ebfceac8006e' Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
I see in your graph you are unloading all the models after the first VAE decode - doing this negates all the advantage of Torch compile on the subsequent runs as you have to re-compile all the model blocks into a new graph every time. Just thinking about it - perhaps v1.4 could have a prototype group with 12 steps and keep the models in VRAM for as many iteration as you need, then once you're happy with the output you could switch to the full 30 steps, upscale and V2V and be happy to throw away the compiled graph at that point?
Torch Compile speeds up the first ksampler substantially on it's own, even if you don't use the same model again. But you also don't use the same model again anyways. The first Ksampler is meant to use Wan I2v, and makes use of Torch Compile. The second KSampler uses an entirely different model. Once the first VAE is complete, we don't need to use the torch compiled model any more.
I am currently experimenting with a Split Sigmas workflow which would enable an early preview. (I'm still exploring various optimizations possible with this approach.) This workflow would definitely keep the initial model loaded until after you've picked a good base generation.
The second model generation seems to lose details from the first generation. What can I do to prevent this,
Reduce denoise on the second Ksampler
It's a fantastic workflow, thanks for providing it as I'm new to WAN.
However, do you know how to prevent the videos from the videos all being produced in slow motion? It would be nice to be able to make the realtime as well.
Количество кадров в секунду. FPS!
Seems to be some kind of color shift that is happening at some point in the gens, have you noticed or looked into it?
It seems to be fairly common with Wan videos. Lots of the example videos here have something similar. Try increasing the step count, using more descriptive prompts, and using different CFG values and samplers.
I'm sure there are existing tools for automatic color correction, probably some that'd even run in ComfyUI.
@HazardAI Yeah I've seen some color correction nodes, was just curious to whether its something that's happening with this version of the workflow since I never noticed any color shift from the previous version.
@django348 I see, I think you're onto something. Looking at the videos posted to the 1.0 version from myself and others, I also don't notice any with the color shift, so there definitely could be a setting change causing it. The most notable one that I can think of is that I changed TeaCache to run for the entire process. It would definitely be worth trying some different values for the TeaCache start, end, and threshold to see if it is causing the issue.
@HazardAI I see, I'll try messing around with it and see if it changes
To fix this, change the temporal size in both vae decode nodes to a higher value than your frame length - i.e 96 if you're using 81.
am getting this error with 480p gguf? help?
Given normalized_shape=[1280], expected input with shape [*, 1280], but got input of size[1, 257, 1664]
This is regarding on 1.3 workflow.
I am getting this error...
RuntimeError: Failed to find C compiler. Please specify via CC environment variable. Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
If its something simple, please let me know. But as of now, I'm just gonna stick with the non Triton version. I would LOVE that 30% increase in speed but without a corresponding step by step install guide, I just can't deal with this install. I don't know jack about all this coding, terminal, pytorch. When I install one thing, that thing requires me to install 2 other things and so on. Hunting down YT videos on how to install X for each step.
To OP: I know all this coding might be 2nd nature to you but to some of us, its really hard to keep up. Heck, first time I tried to installed something, I thought its just typing in code in CMD and not Terminal. Ya got to release a hand holding install guide. I'm sure that would solve most of the backlog technical questions.
That definitely looks like a Triton issue. The 1.4 version of the workflow lets you disable the Triton stuff rather than being a whole separate workflow. It should work exactly like the non-triton version when you disable the torch compile nodes with the toggle.
I'm almost definitely not going to make any install guides for Windows myself, since I don't own a computer with Windows installed on it and have never used ComfyUI on Windows.
Why is every video i try to make always in slow motion?
Wan, and other video generators are trained to use slow motion because the end results generally looks better. After interpolating frames, you can change the FPS to increase the speed of the output. You can also take the last from of a video and use it to extend the last one, using the same simple process.
Works great! The tiled decode is a lifesaver for my 3080 10gb ❤
One minor thing is that the 'Enable torch compile' on false doesn't disable the torch node in vid2vid, but after manually bypassing it it works flawlessly!
For reference: This fitted in 9.4gb VRAM
https://civitai.com/images/81183535
how many time do you get the result with your GPU ?
@digitalyouneed With the 1.3 flow I'm seeing roughly 15-20 mins (for ~4 sec gens) fully, and just below 10mins for the base (no upscale/interpolate)
That's 30 steps on batches between 60 and 80 frames.
It's relatively consistent as I'm hitting close to vram limit. Once I cross that (>6 sec gens) it takes hours, so I just stick to 5sec max.
@DollarStoreAbraham Wow, i think it so fast enough. I use RTX A6000 48gb generate in 7-10 mins for 5 second. Can i get your workflow ?? i want to try in your GPU setting too
@digitalyouneed It's literally just the 1.3 flow. With rel_l1_thresh at 0.35 and using the Q3 models, then just the frame length something around 60-80 and sampler steps at 30.
I don't think there's anything else that could affect the generation speed, maybe I'm missing something idk
I´m a newbie here, how did you bypass the torch node in vid2vid?
This is by far the best setup I've seen. I would love to see more workflows from you. Maybe using Vace, First/Last Frame, Controlnets, etc.
Thank you! I've been unfortunately super busy, so haven't had much time to test out new things in video generation. I'm especially interested in the Skyreels models, as they seem like the best natural additions to this workflow. Once the 5B models get released and get GGUF support, I'm definitely planning on putting some time aside to make some updates.
I am currently experimenting with Split Sigmas for increased speed and faster previews which I should have out soon enough.
@HazardAI First/Last Frame with this technique please! and please run patreon we love to share you a dime for the work! u are awesome!
This is fantastic, but I have more vram now (5090), how do i convert this off of GGUF? Anyone have a similar WF that's not quantized? Or is 32gb not really enough to run the base models?
Its mostly a drop in to switch from the GGUF to the FP8 model. You'd just swap the "Unet Loader" nodes for "Load Diffusion Model" ones. (And the Clip Loader for a standard clip loader.) That said, Q6 quantization often performs as well or better than FP8 in other types of models, so I'd probably just switch to that. You might be able to get a slight performance increase out of FP8, so it'd be worth running a few generations with each on your hardware and seeing what kinds of results you get.
Also, congrats on getting a 5090!
@HazardAI Thanks so much! So Q6 for the model and the Clip loader, correct? I see only the Q6_K variants. That's the best bet then, correct?
Just came here to say I love the workflow and the seed Easter egg.
Hi, new to this, how do you add a checkpoint, like the causvid one?
Wow that blockswap node is op, I incorporated it in a different workflow and now my 16gbvram can use the 14B 720p q8 gguf model at 720x1280 in 10 minutes w/o getting OOM error, ty
how did you manage to do that? my 3060 12gb with 32 ram can only generate videos between 30/40min. using 40 blocks to swap. (i used 14B 480p q4 gguf model)
Yeah i remember having the same issue. The thing thats helping my speed is the causvid 6-10 step lora + teacache node. Sageattention (gl installing sage on windows but here's a guide -https://www.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/) helps a bit with speed too. Before, I was only able to use 480p q4 at low res. I might upload the workflow. I used this workflow - https://civitai.com/models/1622023/causvid-2-sampler-workflow-for-wan-480p720p-i2v then manually added the the teacache node, block swap node, and yoinked sageattention nodes from a different workflow
@bhopping thank you very much! i'll try it out
@bhopping when adding those nodes to the other setup, do you duplicated the teacache, etc. stack for each sampler?
@pocketsVFX Good question. For my workflow I didn't dupe. To know if the nodes are working, you can check your terminal after a test generation. If your log mentions each node (BlockSwap, TeaCache, SageAttn) before and after running each sampler, you're good
i try to use the "-no triton- WF , but get an error about a triton not installed
Make sure you bypass the "Patch Model Patcher Order" and "TorchCompileModelWanVideo" nodes on BOTH the initial generation AND on the v2v part. Mine gave the error because those two nodes were only disabled for the initial generation, but not for the v2v section.
Hi im newbie here , when starting why i get error :
TypeError: WanAttentionBlock.forward() got an unexpected keyword argument 'context_img_len'
Okay, so I can run ComfyUI with version 1.3 and Triton successfully. However, after I start a generation, when it reaches the point of unloading all models, ComfyUI suddenly disconnects. I'm not sure why. Here's the log:
----------------------------------
TeaCache skipped:
16 cond steps
16 uncond step
out of 30 steps
-----------------------------------
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [13:17<00:00, 26.57s/it]
Requested to load WanVAE
loaded completely 0.0 242.02829551696777 True
Unload Model:
- Unloading all models...
E:\Ai\WAN\ComfyUI_windows_portable>pause
Press any key to continue . . .
Hello, I am trying to run the workflow and it seems to work fine up until it says "TeaCache: Initialized" and then just stalls out for hours without anything. Any advice?
Ps. I have tried installing triton but it seems quite complicated and I am not getting far on getting it installed, so I have that portion of the code disabled.
Video step by step - https://www.youtube.com/watch?v=DigvHsn_Qrw
Best of luck!
Just going to suggest in the next iteration to bump up the temporal_size to 96 on both of the VAE Decode nodes. It solves a problem I see a lot of people having where the color shifts near the end of the video. Loving this workflow though, amazing work!
That solved it! I did a lot of testing and got it down to the VAE decoder, but I didn't know how to fix it. I just knew the regular VAE decoder didn't have the issue. I'm glad it was as simple as changing a setting, instead of a bug in the Tiled Decoder. Thanks so much for sharing this fix!
This helped me too. Thanks!
HI there, I'm getting this error message on every run now from the last k-sampler during he v2v... Any idea what's going wrong? searching hasn't helped me resolve it yet: "output with shape [1, 72000, 1536] doesn't match the broadcast shape [2, 72000, 1536]" (The actual WAN run is fine, only the video2video smoothing step has this issue...
im able to generate videos with the default settings and the q4 14b model on my laptop 4060 8gb vram and 16 gb normal ram. takes around an hour.
i have "blocks_to_swap" at 40 otherwise it runs out of memory
:skull:
This is a great flow, the only issue I have is that the upscale causes the subject to not quite look like
themselves. The video before upscaling is nice though.
Anyone have a guide or know how to determine correct boxswaps, teacache, etc settings for your system? I have a 5090 and I'm unsure I have all the right settings dialed in... Any recommendations? (manual testing is taking forever obviously)
I seem to keep getting this error: AttributeError: 'OptimizedModule' object has no attribute 'forward_orig'
the issue was related to using the windows desktop comfy ui install, I went with the portable install manually, using this video as a guide which includes steps to properly set up the python environment, torch, triton, and sage: https://www.youtube.com/watch?v=Ms2gz6Cl6qo
what i have to use for this workflow? i'm using automatic1111 rn
You use it with ComfyUI
Great workflow, simple enough for me to get working.
I did have an issue where the lighting flickers near the end of the 5 second video. I found a solution online, I switched the VAE Decode (Tiled) to the normal VAE Decode, and the lighting is better now.
I also randomly run into another issue where it crashes at:
RuntimeError: output with shape [1, 4860, 5120] doesn't match the broadcast shape [2, 4860, 5120]
This seems to be the github issue, it seems to have to do with the KJNode for teacache. Does anyone have a solution?
https://github.com/kijai/ComfyUI-KJNodes/issues/219
I keep getting the following error, but its not consistent. Sometimes it happens once and then doesn't happen for two or three generations, sometimes it happens five times in a row. Google has proved fruitless.
KSampler
output with shape [1, 15120, 5120] doesn't match the broadcast shape [2, 15120, 5120]
I haven't changed anything and am using it with all the default models. Anyone else getting this? Anyone know how to fix it?
trying your workflow and ones embedded in some of the gallery images, and always getting this error at the Sampler stage:
WanAttentionBlock.forward() got an unexpected keyword argument 'context_img_len'
any ideas?
I'm also getting this. Anyone have any idea?
hands down the best wan i2v workflow. looking for more updates!
thanks for sharing!
can u help me im stuck [email protected] helppp
@ericyeoung976538 sent you dm on gmail
Why do you use a GGUF model for I2V and then a safetensor for the text to video refiner?
This comment is long overdue. It was because the GGUF models let you save quite a bit of memory, which lets the workflow run on lower end hardware. For the refiner, non quantized models seem to run a little bit quicker, and smaller models seem more sensitive to quantization, so it made sense to me to just use the full sized model.
Idk what's up with mine when i test the workflow out, it's just running on ksampler forever, i leave it at 3% for a few hours before ultimate restart. Spec is 32gb ram with 4070ti. I made no changes to the workflow. What am i doing wrong?
Same issue here, but 4090, 128gb ddr5, it just sticks and does nothing for hours. the workflow doesnt even use sage attention...which i have installed.
StarW Try blocks to swap at 40, when i read the note i thought it was optional but put it at max managed to make it works
Pantheman thx ill try that
StarW In the v1.3, take a look at the resolution. On a fresh download of v1.3, it was set to something like 10000x480. I too didn't catch this and wondered why v1.3 won't work. I'm still going to use v1.2 though, theres also some lighting issue or something to with my prompting?...where the scene's lighting changes in a split second and it looks weird, this issue is seen around the 4 second mark on a 5 second video when generated on v1.3 with same settings as v1.2 (at least I think everything is the same). I don't really know what all the new nodes do/effect the generation compared to v1.2
vgbestly it's 10000x480 because it's already locked to a ratio in another node iirc, so it only uses the 480 to scale, the 10000 is just placeholder in case you turn off ratio and make a custom one
Pantheman Idk whatever it is but when I changed this to 1280 like in v1.2 then it worked out. But still the issue of lighting changes, so I still use v1.2 instead.
I'm new to this, I downloaded the workflow but when opening it on comfyUI it automatically prompts that I have missing nodes. But when I click on "install missing nodes" the marketplace doesnt find these. It's weird no one else is commenting on this case, any ideas? ComfyUI is updated to the latest version
The only missing node for me was wanBlockswap. I clicked on "Manager" at the top right to open the menu and then on "Custom Nodes Manager". There I was able to search for "ComfyUI-wanBlockswap" and was able to install it. Maybe this helps you
The vid2vid step is giving me a drop in quality compared to the upscale
wan2.2GGUF has been updated. Is there an improved workflow?
This is my favorite workflow! I would love a wan 2.2 version.
BluesElwoo Agreed, the quality is unmatched!
Took me a little while but I've created a wan 2.2 version I'm fairly happy with.
Any Img+Reference Video = Video? Workflows? Wan2.2 or Wan2.1
Would by 8GB of vram meltdown if i try using this?
I used an RTX 2070 Super before I got an upgrade. On v1.2 with Wan2.1-i2v-14b-480p_Q3_K_M.gguf and 3 lora, I was able to generate 5 second clips at...around 1hr 40ish mins on first run. On second and more runs, maybe only 1hr 20min?
It was "acceptable" because I would just run/queue up like 2 runs on 3 images I want to i2v and went to sleep and wake up to 6 finish stuff. Pick n choose the good generations and try again next night for the bad gens. It was tedious but it was worth it as a new comer to the space on old hardware. Side note: I STILL don't think I installed Triton right so I'm leaving another 30% performance on the table.
vgbestly At what resolution? are you going for the absolute highest quality? thsoe times sound ridiculous, longest I've had with a 3070ti 8gb was around 40 minutes but that was early on then generally got to about 14mins without lora's etc and now i just use the speed lora'a and do stuff in about 3-5 minutes... might be some major resolution differences going on here but 1hr + sounds ridiculous or maybe the 2070 is a lot worse than the 3070? if so I'll stfu but I think you're leaving speed gains on the table
is triton only needed for interpolation? no matter what i do comfyui wont see triton is installed
Install it within the comfui terminal pip...etc then do pip sage attention inside the terminal too restart desktop ver I'm assuming
This covers the various ways of installing Triton depending on your ComfyUI install type. https://civitai.com/articles/12848/step-by-step-guide-series-comfyui-installing-sageattention-2
workflow wan 2.2 version soon? :D
can you just replace and use 2.2 with the current workflow?
redSamred i don't think so
@redSamred For wan 2.2, high noise and low noise models are used, and then results must be ‘glued’ together, but this workflow can be used with the "low noise Wan 2.2 model". Overall, the quality is normal, as is the speed, but it does not reach the full potential of Wan 2.2.
PLEASE POST WAN2.2 VERSION!!!
It took me a little while but I've created a Wan 2.2 version. Its much much simpler this time around too.
I got it to work but the output is kind of blurry and not even close to the quality of the ones posted here.
Any tips?
14B-480p-Q3_K_M
UMT5-XXL-encoder-Q4_K_M
wan2.1_t2v_1.3B_fp16
16 vram
image (ref) size 896x1152
resize to 576x1024
steps 30
cfg 4.5
I was having this and realized the WAN lighting loras are not optional. You have to have working i2v lighting wan loras for high and low noise. You can skip them in the workflow, but you get blurry images. The same applies for the upsample, you can skip one of the loras there, but the workflow won't run at all.
@rageface Correct. If you want to run it without them, you'd need to change the settings in the KSamplers to use many more steps.
I have a problem with "text multiline" i don't understand (yes i'm a noob)