Workflow Overview
This is a sophisticated ComfyUI workflow designed for high-quality, controllable video generation using the powerful Wan2.2 5B Fun model. It leverages ControlNet (via Canny edge detection) to transform a driving motion video and a starting reference image into a stunning, coherent animated sequence. Perfect for creating dynamic character animations with consistent style and precise motion transfer.
Core Concept: Use a "control video" (e.g., a person dancing) to guide the motion, and a "reference image" (e.g., a character design) to define the style and subject. The workflow intelligently merges them into a new, AI-generated video.
Key Features & Highlights
🚀 State-of-the-Art Model: Utilizes the
Wan2.2-Fun-5B-Control-Q8_0.ggufquantized model for a balance of incredible quality and manageable hardware requirements.🎨 Precision Control: Implements a Canny Edge ControlNet. The workflow extracts edges from your input video, ensuring the generated animation perfectly follows the original motion.
⚡ Optimized for Speed: Integrates a custom LoRA (
Wan2_2_5B_FastWanFullAttn), allowing for high-quality results in just 8 sampling steps without significant quality loss.🧠 Efficient LLM Inference: Uses a separate, quantized
umt5-xxl-encoderCLIP model for text encoding, reducing VRAM load on your GPU.🔧 Complete Pipeline: Everything from model loading, video preprocessing, conditioning, sampling, to final video encoding is included in one seamless, organized graph.
📁 Ready-to-Use: Pre-configured with optimal settings, including a detailed positive/negative prompt. Just load your own image and video to start creating.
Workflow Structure
The workflow is neatly grouped into logical sections for easy understanding and customization:
Step1 - Load models: Loads the main Wan2.2 5B model, its VAE, the CLIP text encoder, and the FastWan LoRA.Step 2 - Start_image: Loads your initial reference image. This defines the character and style for the first frame.Step 3 - Control video and video preprocessing: Loads your motion video and processes it through the Canny node to extract edge maps.Step 4 - Prompt: Where you input your positive and negative prompts to guide the generation.Step 5 - Video size & length: TheWan22FunControlToVideonode packages everything, setting the output video dimensions and length based on the control video.Sampling & Decoding: The KSampler runs for 8 steps with UniPC, and the VAE decodes the latents into final images.
Video Output: The
VHS_VideoCombinenode encodes the image sequence into an MP4 video file.
How to Use This Workflow
Download & Install:
Ensure you have ComfyUI Manager to easily install missing custom nodes.
Required Custom Nodes:
ComfyUI-VideoHelperSuite,ComfyUI-GGUF(for loading the .gguf models).Download the
.jsonfile from this post.
Load the Models:
Main Model: Place
Wan2.2-Fun-5B-Control-Q8_0.ggufin yourComfyUI/models/gguf/folder.CLIP Model: Place
umt5-xxl-encoder-q4_k_m.ggufin the samegguf/folder.VAE: The workflow points to
Wan2.2_VAE.safetensors. Ensure it's in yourmodels/vae/folder.LoRA: Place
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensorsin yourmodels/loras/folder. Adjust the path in the LoraLoader node if yours is in a subfolder (e.g.,wan_loras/).
Load Your Assets:
Reference Image: In the
LoadImagenode, change the image name to your own file (e.g.,my_character.png).Control Video: In the
LoadVideonode, change the video name to your own motion clip (e.g.,my_dance_video.mp4).
Customize Your Prompt:
Edit the text in the Positive Prompt node to describe your desired character and scene.
The provided negative prompt is already comprehensive, but you can modify it as needed.
Run the Workflow:
Queue the prompt in ComfyUI. The final video will be saved to your
ComfyUI/output/video/folder.
Tips for Best Results
Control Video: Use a video with clear, strong motion and good contrast for the Canny detector to work best. Silhouettes or videos with a plain background work excellently.
Reference Image: The first frame of your output will closely match this image. Use a high-quality image of your character in a pose similar to the first frame of your control video.
Length: The
lengthinWan22FunControlToVideois set to121based on the original video. If your video is a different length, you must update this value to match the number of frames.Experiment: Try adjusting the LoRA strength (e.g., between
0.4-0.7) or the Canny thresholds to fine-tune the balance between motion fidelity and creative freedom.
Required Models (Download Links)
Wan2.2-Fun-5B-Control-Q8_0.gguf: https://huggingface.co/QuantStack/Wan2.2-Fun-5B-Control-GGUF
umt5-xxl-encoder-q4_k_m.gguf: https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main
Wan2.2_VAE.safetensors: https://huggingface.co/QuantStack/Wan2.2-Fun-5B-InP-GGUF/tree/main/vae
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/FastWan/Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors
Conclusion
This workflow demonstrates the powerful synergy between the Wan2.2 model, ControlNet, and efficient LoRAs. It abstracts away the complexity, providing you with a robust, one-click solution for creating amazing AI-powered animations. Enjoy creating!
If you use this workflow, please share your results! I'd love to see what you create.
Description
Updated workflow for better quality.
FAQ
Comments (5)
Is it possible to use this for img2img?
Try using only 1 frame length with an OpenPose image instead of a video. I prefer the 14B version, as the 5B is not notorious as a top image generator.
@zardozai I love Wan for generating images even more than video. I have a pretty good upscaling workflow but I'm missing controlnet. I've been hoping Wan Fun could be the solution.
Its just a copy of :
B站、Youtube:T8star-Aix
and somthing tell me B站、Youtube:T8star-Aix the owner of it. maybe throw a credit at least and not present it as yours?
What are you even talking about? It's a GGUF version of the workflow that is present in the ComfyUI template! LMFAO 🤣