Now faster and easier to install
This workflow uses a small baseline generation using the 14B image to video model, followed by upscaling, and then smoothing out the result using the 5B model.
This lets you test prompts and iterate quicker on the base generation before upscaling to a final resolution.
Links for all the required models and where to put them are now included in the workflow.
FAQ
Do I need both Wan 2.1 and 2.2 VAEs?
Yes. The 2.2 VAE only works with the 5b model (confusing, I know). Make sure the main section loads the 2.1 VAE, and the upscale section loads the 2.2 VAE.
Its frozen on VAE decode
The second vae decode can take a long time. Just be patient.
Description
(Includes both Torch Compile and non-triton versions)
Uses better default values.
Update some layout to fit ComfyUI updates.
Simplify prompt handling.
FAQ
Comments (113)
Bit offtopic, but could you share the workflow for the girl?
The examples look amazing, the best i've ever seen
When using Wan 2.1 t2v 1.3b in the final steps, the image detail/fidelity significantly decreases. What is that problem? I can't achieve such high definition like the example images. What setting might be incorrect?
All of my videos have metadata attached, so you can download one and drop it onto Comfy to see exactly what settings I used.
If you're losing detail you probably want to try reducing the denoise strength on the vid2vid Ksampler. For 2d animations I usually go lower, especially when trying to do a specific character. v1.0 had it set to 0.25 by default which is good at smoothing out some imperfections, but definitely results in changes. The v1.1 now starts at 0.1 and you can go even lower. Because it uses fixed seeds, you can also re-run with different de-noise settings and it'll only run the vid2vid part, and you can dial in a better value fairly quickly.
Let me know if you're able to get the results you want, I'm eager for feedback, and I might be able to offer further suggestions.
I had a similar problem. My issues was not using a correct starting image given the dimensions the creator lists. I just cropped my image to fit to make sure it is the expected. Gave really good results
@HazardAI I've noticed that in the final 1.3b step(s), after denoising, the character's appearance undergoes quite significant changes, even though denoising has already been set to 0.05. Is this a model limitation or some other issue
@zens 0.05 is pretty darn low, I don't know how much lower you could go before it wouldn't be doing anything. You could try less steps as well. Personally, I found 4 to be too few, but that might depend on the type of style you're using.
I have done a little experimenting with using a face swap and restore node after the upscaling, and it was very effective at preserving the face from the source image. That only works on realistic though, I don't yet have a good solution for 2d.
The new Skyreels models look like they might be very promising for a better vid2vid process, especially the upcoming 5b version. I'm experimenting with them now, but so far don't have anything better than what the 1.3b t2v model already does.
I observed that the resolution seems low at certain points when using K-samplers. Does this affect the character's appearance? I tried several times with the 14b i2v model, and the character's appearance didn't have any issues. However, with the 1.3b model, the appearance shows a 20-40% difference.
@zens Using the 14b i2v model for the base generation, or for the vid2vid portion? I wasn't able to get any nice results out of the i2v model for a vid2vid process. Using the 720p model and a higher base resolution definitely makes nice videos, but that requires much more vram and time. This workflow exists mostly to get nice looking videos without having to do high resolution i2v generations.
You could try reducing the noise in the vid2vid stage, I use 0.05 or 0.01 and it makes it less plastic and also mainting details
@HazardAI just want to comment on skyreels, i tried it today with vid2vid and kept the face much closer to the original then the wan_t2v1_3b, just a first impression i had
@ellaharper25498 which skyreels model did you use?
Using Vae Decode tiled I was able to get this to run on 8gb vram on a 3070 + 16gb pc ram in 20-40 mins!
Checkout some of the workflows in my posts
Really good workflow. I had some issues with noise on the final product, but turns out it was just cause my input image was the wrong dimensions. I fixed it and it gave really good results with the same seed (nice seed btw)
Used the Q3_K_S gguf
Thanks so much for the report back! I'm thrilled to hear that it works on 8gb vram. I've been hesitant to recommend this to some people, but now I can. Thanks!
@HazardAI Thank you for the sick workflow! I was curious as to how the t2v part works in all of this? Like does it basically just take the video as a prompt and make a new video? Or is it more of just cleaning up?
@shringusjoe506 Basically just cleaning it up. You can leave the prompt completely empty and it still does a quite nice job. Upscaling every frame individually with an upscale model typically doesn't produce very nice results. You get a noticeable amount of stuttering between individual frames, because image upscale models aren't ideal for that sort of task. Denoising the video a bit and running it through the t2v model helps to align frames closer to each other, giving you a smoother more natural look.
The t2v model is a bit of an odd fit instead of an i2v model, but my tests using the i2v model for a smooth pass produces consistently worse results. I am starting to experiment with the new Skyreels models to see if any of them would be better.
@HazardAI thank you nice job brother
Does the RifleX node have any benefit on length of <=81? or is it only with >81?
Just curious about how it works in this workflow
To my understanding, it wouldn't have any benefit under 81 frames.
Hi ! Used your workflow using Q4, took about 1h50 to render 3sec with a 3080 RTX ! loaded with 3 Loras, thanks for sharing, i'm using it with Runpods and its pretty fast ! ( 243s) with Q4, going to test with Q6 to have better renders !
I don't personally see a difference between Q3 and Q6 for the base 480p render. I usually use Q6 because I have the vram to fit it, but plenty of the example videos I've posted here were done with the Q3_K_M.
I hope you mean 1m50s instead of 1h50m. It typically takes about 10 minutes for me on a 3090.
@HazardAI Nop, 1 hour and 50 minutes, maybe something was wrong, i made a mistake and forgot to change Vid2Vid Vae, ClipLoader models.. you used 1280x480 on your Q3_K_M ? i'm going to try again locally ! ( I'm still a beginner to Comfy ^^ )
@animeaiverse625 Before running a generation, trying hitting the "Unload Models" button in ComfyUI Manager (if you have it). Automatic unloading with Wan seems to be a bit dodgy right now, and if it only partially loads, it'll slow things down a ton. Keep an eye on the console log, and see if it says "Loaded Completely" after "Requested to load WAN21". If it says "Loaded Partially", then you'll want to reduce either the input image resolution, or the frame length until it says "Loaded Completely".
@HazardAI Oh ok thanks, i saw that in console, didn't manage to get a button to Unload Models, even with a custom node, i'm trying with 480x544 is working at 15s/it so its better, for a 61 lenght, i noticed on my cloud RunPod, triton was activated, can it be a really great improvement ? i'm having troubles making it work with Comfy ui maybe because its the desktop version..ifk, thanks for the tips !
@animeaiverse625 The speedup from Torch Compile/Triton is pretty good. I measured 61 frames at 480x702 without Torch Compile at 394s and with Torch Compile 309s.
@HazardAI Does it load the Text encoder and Wan model at the same time? because on my 12GB 4070ti i noticed it only loaded the Q3 partially. So i assume the VAE or text encode blocks the resT?
Although i thought the text encode is flushed but maybe that was in another model and not WAN...
Anyway any suggestion for best model and text ecoder combination? Did you notice degradation with smaller text encoder clips?
@Akira_HentAI I believe the text encoder has to be loaded during generation, but I'm not certain of that. I typically use q4 for the text encoder. I don't think theres any discernible difference to q6.
the video out isnt really smooth, it looks like its missing interpolation between frames making the video pretty choppy, any recs? thanks for the flow!
The workflow outputs the video at a few different stages, not just the final interpolated one. Are you looking at the video that has "interpolated" in the name?
@HazardAI yes, I tried to increase frame rate at different steps, it helped a bit. on another note, any recommendations to make the skin less plastic/flat?
@roselatina I think that'll mostly come down to the input image. But that does seem to be a problem in general with lots of generative models. Lighting effects can go a long way to mask over that.
You could try reducing the noise in the vid2vid stage, I use 0.05 or 0.01 and it makes it less plastic and also mainting details
last one! the workflow seems to be trying to loop the video, if I don't care about that, what should I do? Thank you
Fantastic! 2 questions.
1. Does the No Torch Compile make a difference to thend result?
2. My ComfyUI has no upscalers. What one's do you use/recomend?
I don't think torch compile would have any impact on quality, just speed.
I usually use RealESRGAN 2x
https://openmodeldb.info/models/2x-realesrgan-x2plus
or 2x Nomos
https://openmodeldb.info/models/2x-NomosUni-esrgan-multijpg
There's probably a better upscaler out there for realistic stuff, but I've been happy enough with those.
@HazardAI Thanks for the answer. So Torch Compile is FASTER?
Also i heard someone mention the dont want to install the software the torch compile has on windows. Can anyone clarfiy why?
@redlittlerabbit Quite a bit faster. It depends on Triton, which I hear can be a bit painful to install on windows. If you use Linux, it comes with recent versions of Pytorch automatically. On windows you need to install it yourself using some terminal commands.
@HazardAI Thanks for all the help.
For those wondering about Triton...
https://github.com/comfyanonymous/ComfyUI/issues/7421
"Ah yes you can just run pip install triton-windows in the bottom panel terminal" - in ComfyUI
so am using the non triton workflow and it still give the triton error
KSampler
backend='inductor' raised: RuntimeError: Cannot find a working triton installation. More information on installing Triton can be found at https://github.com/openai/triton Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
If you previously tried running it with the Torch Compile version then you'll probably need to restart Comfy.
@HazardAI thanks! issue solved
@HazardAI fixed for me as well, ty man
I wanted to make a high resolution 16:9 ratio video with the workflow. Totally new to comfyui. TBH I wanted to try this eversince I saw your workflow.
-I changed the resolution in main to 1216*832.
-The image I used was 2k resolution.
-In prep video section I changed the resolution to 1280*720.
-I used the teacache or triton ver.
-The process took an hour to complete on a 4070
Question, how can I improve the quality? or reduce the time taken? Were there any mistakes, something that I could have done to improve the process overall. I have posted the video... Its the woman in red dress...
I see you set the dimensions to 832x1216 on the "Load and resize image" node. That'd be most of why it took so long to run. The higher the base resolution, the higher the overall quality. But this workflow is intended to use a base resolution at or near 480p to keep things speedy, and then to upscale to 720p.
running the non-triton version, works incredibly well. thanks for the work putting it together, love the customization options and tweaking. seriously, unreal
Sorry for asking but i cant Help myself.
I allways get noisy outputs in the "Main" Tab, i did change the resolution or used images with 832x1216 but nothing changed.
im new to comfyui, what am i doing wrong?
Wan can be a bit prone to noisy videos. If I get one that's noisy, I typically change the input image to something new and try again. It seems like it happens to me more often on simple and dark backgrounds, and it gets better if I put some description of the background into the prompt, but I don't have enough tests to know if that's just a coincidence.
@HazardAI ty for your response. i used different Images and still the same. using the Wan 2.1 workflow from github i dont get this problem. ill try to fix what i did wrong, ty for your time!
i also got a ton of incredibly noisy videos (like complete trash) on this workflow even though ive been using other wan workflows for a while now. tried troubleshooting everything i could but eventually downloaded one of the hq vids submitted on this page using this resource and loaded its workflow into comfy. now its working and generating really nice smooth videos. only thing i can think of is that i wasnt using the Q6_K model? And now im using the Q8 T5 but the Q3 also worked.
I have noticed that loras and prompting don't really have nearly the same influence that I am used to. Its just really confusing sometimes
@Collin i fixed it with turning "VRAM management mode" under "Server-Config" to highvram. Comfy was
using normalvram at auto.
Thanks for sharing this, getting some decent results! Still trying to figure out best settings with Q6_K and a 4090. Anyone know how/if it's possible to use sagattention with this workflow?
my resize image in the prep video is always red so it can't go past that stage to get to the interpolate. Any ideas?
I suppose you either don't have the node or don't have the model downloaded or on the right folder. Try installing the comfyui manager, which will help with missing nodes. https://github.com/Comfy-Org/ComfyUI-Manager
@gabrielsimao1234567890359 nah I have the node. Its outlined in red.
Sounds like its missing an input. Do you an upscaling model selected? What error message do you have in the console? (You should also be able to get the log from the web UI by hitting the "Toggle Bottom Panel" button.
@opendream_s Just download the model used on the workflow for the red node or select one that you actually have on your system.
@HazardAI I have the same problem. I have selected the same scale model as your example and tried other scale models, but it still shows red.
have you found a solution to this?
@p0lar0id1314310 I just created a new Resize Image node and hooked it up. The one that is saved to the workflow seems to have a different setup which isn't working for me.
I tried adding a length and width floating node before the "resize image" node, and it worked fine.
Noob here, the cliploader(gguf) node is missing the 'wan' type value and was wondering what did i missed. Otherwise it is the first workflow to work for me ;)
You'll need to update the node. ComfyUI Manager is probably the easiest way to do that.
Nice, thanks !
I recently installed Triton and SageAttention and wanted to use the torch compile version. After getting it all installed and running I am now running into a problem where it runs without error, but gets stuck at the KSampler forever. Is there anyone who could help me figure this out?
Upon further inspection, it seems like my rig just can't handle it. Gotta do without I guess.
@victerprime You could try the Q3_K_S model. You could also try dropping the input resolution a little lower. 432 is still workable. ComfyUI seems to have some issues not unloading models after finishing a run, so I hit the "Unload Models" button to manually clear them before each video run. That could save you enough vram to load more layers.
I also have the TeaCache cache_device set to "main_device". Changing that to "offload_device" would also save some vram.
@HazardAI Thanks for the tips! I'll try those out later.
I get the same issue on a 4090 and 64gig RAM. What rig are you running?
@gabrielsimao1234567890359 I have a 3070 Ti with 8GBs VRAM and 64GBs RAM.
@gabrielsimao1234567890359 I am having the same issue. 4090, 24GB VRAM. No error, no code, just an infinate stop.
@HazardAI Thanks for the help. Surely a 4090 should be able to tackle this though?
I'm using GGUF Q4_K_M, 4070 ti super, 600x450,
everything works very strange, I can have a normal generation through two samplers, sequentially, the first generation happens at the VRAM limit ~14.9-15.4 GB. On the second ksampler the main 14b model is unloaded, 1.3b is loaded. Everything is fine. (first ksampler runs ~5 min, second ~1min). no shared memory usage at all.
I start the next new generation, having previously pressed unload models, VRAM is freed completely, but when the first ksampler starts working, VRAM usage goes to the sky and is loaded into shared memory some 7GB more on top and normal generation can be forgotten about.
It seems that something is not unloaded from VRAM when needed. I don't understand how to set it up so that VRAM doesn't overflow.
I tried teacache with “main_device” and “offload_device” in both stages.
I found a problem, on the subsequent generation of CLIP model for some reason is loaded into VRAM instead of RAM and causes overflow, I solved it by setting an additional node “Force/Set CLIP Device” with value “cpu” to force the value, because it apparently does not work in the original CLIP node. Probably it can also be forced by Comfy startup parameters like “--clip-cpu”, but I haven't tried it
@DRZ3000 Can you share the workflow or at least a screenshot?
@gabrielsimao1234567890359 I started testing further, unfortunately it was a false hope, it seemed to work, but it turned out that it didn't, it worked a couple of times by some magic, then again the same thing. I tried different nodes to clear VRAM and things, start parameters with forced loading of models into CPU, everything still looks like this - in the first generation everything goes correctly, models are loaded into RAM, except UNET, it is loaded into VRAM, at the end of the first pass almost all RAM (32 GB) is occupied, but VRAM is almost empty, I press “run”, a new generation starts, almost all RAM is freed and everything starts loading into VRAM, falling into shared memory. At this point you can definitely terminate the python.exe process and start comfy again. Something is specifically wrong with the memory allocation in comfy, you can see it clearly here with the workflow, with two ksampler. Somewhere I came across a similar topic in the discussions, but I can't find it. I like this idea with the second pass, it fixes the picture too well, would like to sort out the problem with memory
I think restarting Comfy is probably the only surefire way to deal with the vram issues at the moment. The Comfy developers are great though, so I assume this will be fixed in an update before too long.
@gabrielsimao1234567890359, @HazardAI , i found solution for my situation, maybe might be helpful
My solution was to add WanVideoBlockSwap.
I am running a 4090 and get stuck on Vae Decoding. Anyone knows a workaround or had the same issue?
ya it takes a really long time and i usually run out of vram so it uses ram (also using 4090). sometimes im lucky and it takes short time, i usually increase that luck by restarting pc and only running comfyui
hey i also have a 4090 and it hangs at the same process. i don't get what's going on
@halithaz hi, have you found a solution for this?
Hi, I'm currently using the Wan 2.1 - seamless loop workflow v1.2, and I encountered an error when the workflow reaches the SamplerCustomAdvanced node (ID: 316).
The error message is:ValueError: not enough values to unpack (expected 2, got 1)
Any advice on how to properly format or debug the inputs for SamplerCustomAdvanced would be appreciated. Thanks for your amazing work!
serious hats off to you. i've been messing around with various hunyuan, wan, framepack etc workflows to various degrees of success but the first generation with this one had my jaw on the floor. good shit
Is it possible to create perfect loop?
While I haven't yet tried it myself, the first-last-frame model should be able to do that. Theres only a 720p version of it, while this workflow is designed to run on the 480p model to keep vram low enough for home user GPUs.
@HazardAI Could you maybe do it with 720p with your current workflow I think it's the best out there, really great work :)
@orisio Thank you so much! You could drop the upscaling and smoothing bits, and just keep the base step and the frame interpolation, and update the base step to use the flf model. It might even work well to keep basically the same workflow, and run the flf model at slightly lower than 720p, and use the upscaling process on the result. But I suspect a pass through the t2v model would break the perfect loop effect.
There's also always the option to just do ping-pong style video, where it just plays in reverse back to the beginning. (That's currently an option in this workflow) But that only works on some videos. I've done some dance style videos with ping-pong that looked good.
@HazardAI Ping-pong is great actually it worked pretty well! I want to try your first example too. But I've started to run into this issue - output with shape [1, 75600, 1536] doesn't match the broadcast shape [2, 75600, 1536] - It happens on second KSampler
Do you know how to fix it?
@orisio I think that comes from using a different VAE than the Wan one.
@HazardAI I do have same VAE as you put to download, but the issue not always but sometimes appears out of like 10 images 6 went through and 4 of them pop that issue. Not really sure why maybe it could be 50x issue as they do have a lot of issues
@orisio I've just run into the same issue on one of my own runs. I re-ran it, and it went away. My best guess would be that its coming from somewhere in the Kijai Wan Video Wrapper. Fingers crossed it goes away soon, since Kijai is super active with updates.
@HazardAI Yeah just changing small thing in prompt gets it through the issue x) I'm also curious do you also sometimes get a bit of gray overlay, gray blur randomly appearing in the video, not always but sometimes happens?
@orisio Yes, especially on 2d generations. Describing the background seems to help a bit, and using images without details in the background helps.
@HazardAI Thanks! I will try with simple backgrounds:)
I'm trying the non-triton version but it always stuck at the KSampler part. I have 3070ti with 8gb vram. I also tried K_S version,offload device but nope, its not working. Any ideas?
yeah getting the same error abotu triton in the non-triton workflow
I have triton installed yet the triton workflow wouldn't work for me (kept getting stuck at the Ksampler with 100 percent of my 16 gb GPU usage...but not actually doing anything), tried the non triton workflow and it worked. also using Q5_k_S version gguf
MAYBE try the triton version? I really have no idea.
often times when the 3rd section of the workflow begins, my computer hangs horribly for a very long time, othertimes it only lags for a couple of moments. i have a 4090. does anyone know why this happens? specifically it's during the vid2vid upscale section
My personal observation is that it's because the VRAM is bursting at the seams
so I have to wait until he runs out, and I don't know why.
@flow123160424 i think it's a a weird problem some people have specifically with upscaler models, because i've found a thread on github talking about the same thing with no resolution. i'm using 15% of my GPU during generation and then when the upscale model begins encoding it jumps to 100% and gets stuck. makes n osense
One way I figured out how to get rid of the VRAM fullness was to add WanVideoBlockSwap.
Is there a version of that that works with the native/GGUF nodes?
I've added the WanVideoBlockSwap node and posted an updated version. So far I've been able to confirm that it definitely gets the total vram under 12GB, but I'd love additional reports if anyone is able to get it as low as 8GB of 6GB.
@HazardAI I also used your workflow and encountered a VRAM issue. However, I added WanVideoBlockSwap, which resolved the problem. I've also used it for V2V upscaling, and the results are excellent.
This is my own version
註解 2025-05-07 003906.jpeg (3078×1628)
i am very happy i found this. this is easy to use, so nice u also put an version without triton into the zip. first time i produce videos with a decent quality. insane, many thanks for your work
Thank you so much! I'm happy to hear it works well for you.