Now faster and easier to install
This workflow uses a small baseline generation using the 14B image to video model, followed by upscaling, and then smoothing out the result using the 5B model.
This lets you test prompts and iterate quicker on the base generation before upscaling to a final resolution.
Links for all the required models and where to put them are now included in the workflow.
FAQ
Do I need both Wan 2.1 and 2.2 VAEs?
Yes. The 2.2 VAE only works with the 5b model (confusing, I know). Make sure the main section loads the 2.1 VAE, and the upscale section loads the 2.2 VAE.
Its frozen on VAE decode
The second vae decode can take a long time. Just be patient.
Description
Simplified to use fewer custom nodes
Uses Lightx2v models with baked in 4 step distill for improved performance
FAQ
Comments (18)
Excellent workflow! Thanks! Is it possible to do the same thing, but keep the last frame?
No idea how this works, does it use 4-step loras?
I personally advise using them, or rather, the models with it baked in correctly from lightx2v. But there's no reason you couldn't also use this workflow without them.
Your workflow is preem! Thanks for the upload!
By far the easiest workflow I have ever used. Thank you so much, I appreciate you sharing the workflow!
Batch Download All Resources (Linux)
Paste into command line and press enter.
If you want Windows version, copy the command and go to aistudio.google.com > Playground and ask it to make a windows version of the command.
# Navigate to the ComfyUI models directory (Standard for Vast.ai templates)
cd /workspace/ComfyUI/models
# 1. Main Diffusion Models
mkdir -p diffusion_models
wget -c "https://huggingface.co/lightx2v/Wan2.2-Distill-Models/resolve/main/wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step_comfyui_1030.safetensors" -P diffusion_models/
wget -c "https://huggingface.co/lightx2v/Wan2.2-Distill-Models/resolve/main/wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors" -P diffusion_models/
wget -c "https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/resolve/main/TI2V/Wan2_2-TI2V-5B_fp8_e4m3fn_scaled_KJ.safetensors" -P diffusion_models/
# 2. Loras
mkdir -p loras
wget -c "https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/FastWan/Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors" -P loras/
# 3. Text Encoder
mkdir -p text_encoders
wget -c "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors" -P text_encoders/
# 4. VAEs
mkdir -p vae
wget -c "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors" -P vae/
wget -c "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors" -P vae/
# 5. Upscaler (RealESRGAN_x2Plus)
mkdir -p upscale_models
wget -c "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth" -P upscale_models/
echo "All downloads complete!"
Can somebody help me? i really want to try using img2vid workflows but i just dont understand how to download it. im used to txt2img on forge and thats all i know. i tried downloading comfyui but when i drop a workflow in it just loads? i guess thats because i need to download some requirements but idk how to do that either so. if anyone could help id be much appriciated!
It is easy to see it directly on a Video Essay. When you drag-drop a Workflow to the ComfyUI, you will be able to download Nodes (Custom Nodes etc.).
And you have to download bunch of models, maybe you have experienced how to download and place them in Forge (models/checkpoints or models/text_encoders).
In this case; You need to download model types of:
• Wan2.2 Diffusion_models,
• text_encoders,
• wan2.1 and wan2.2 vae,
• some upscale models manually.
Maybe there is a chance you can download these models from ComfyUI directly: ComfyUI>Click [Manager]>Install Models. But I recommend download models manually.
Also, from an AI instructor YT Channel, you should see the process from zero. Some of them show all the process from scratch. You must focus to understand how to install custom nodes used in workflows, while watching those tutorials. Peace and happy studies.
Also I will recommend some YT Channels for you.
• Pixaroma → Detailed and well organized ComfyUI Tutorials
• Vladimir Chopine [GeekatPlay] → Slow, But Explaining with Academic way
• Benji's AI Playground → Too much blob talk, But Very Experienced in Video Generation
• Academia SD → In Español, But Very Direct Tutorials
(If this was helpful, please subscribe @saygiylasunar on YT) Thanks ♥
@saygiylasunar Benji's AI Playground is great. I've learned stuff from their videos frequently.
Look at my comment on this post. You just paste into Unix terminal and everything gets downloaded. Then go into ComfyUI Custom Nodes manager and install custom nodes for this workflow. Then click the dropdowns for each red-encircled node on the canvas after attempting to run, and select the option that has been downloaded (a similar name to the original)
where is the interpolating/smoothing part located in this workflow? i only see the lighting/nonlighting sections
I've spent hours trying to use the refine function on this workflow to improve the quality of my videos but no matter what I do it always comes out worse. I understand how the upscaler and interpolation functions work, those are pretty straightforward, but the Ksampler is just a mystery to me. I've tried running it all all sorts of denoise settings, all the way from 0.03 to 0.2 and the end product is always messy and blurry compared to the original. I've tried different samplers, different step numbers (4-14), and different models and loras. I've tried troubleshooting through ChatGPT and Grok and everything just comes back worse.
I've tried the same thing with V2V workflows and I've had the same experience. I just can't figure out how another sampler pass is supposed to improve my videos.
What am I missing? Is the Ksampler pass supposed to actually fix anything?
Quick update because I think I'm having a little bit of luck. I think I'm finally getting some improvements in image quality. I think most of my difficulty was coming from faces getting distorted, so I added a FaceRestoreCF node and things are looking better. By using that and a denoise setting at 0.04 I think I'm actually getting improvements, rather than downgrades in image quality.
It's still very subtle but at least it's something. I don't know how anybody could run this with denoise higher than 0.1 because that just makes the video into something completely different...
why does the 1.7 does not need Clip Vision ?
I kinda liked 1.5 better, 1.7 is too slow and convoluted.
That's a little surprising. I switched things up quite a bit from 1.5 to 1.7 to use the FP8 scaled models instead of GGUF because I was testing them as being notably faster on my hardware (3090). I also thought I managed to make it a little more intuitive with the new layout of the lora loaders and putting the ksampling into a subgraph.
I could totally see the gguf based 1.5 version being faster on some hardware though. And I definitely feel that subgraphs aren't for everyone.