Wan2.1 SkyReelsV2 VACE workflow tested with MoviiGen AccVid CausVid FusionX LoRAs (14B T2V, Reference I2V, Extend & Loop)
This ComfyUI workflow supports:
Image-to-Video (I2V) and Loopable Video Extension (V2V) generation using SkyReels-V2-VACE-GGUF
Using AccVid and CausVid LoRAs with 2 samplers for faster generations
Using MoviiGen and Rewards LoRAs for better performance
You can:
Generate the first video as your starting point
Extend the video one at a time to gradually build out the full sequence
Cherry-pick the best segments for your final cut
Refine prompts step-by-step as the scene or motion evolves
🔧 Components
🌀 SkyReels-V2-VACE-GGUF (by wsbagnsv1)
Based on Wan2.1, fine-tuned in 720p@24fps videos
Integrated VACE (All-in-One Video Creation and Editing framework) allows motion control using reference videos (like ControlNet for video)
Native support in ComfyUI via GGUF format
Temporal consistency across the full sequence
⚡ LoRA Models
Speed
CausVid v2, https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32_v2.safetensors
Quality
All-in-one
FusionX, https://civarchive.com/models/1678575 (T2V for VACE)
▶️ How to Use
🖼️ To Generate Video from an Image as First Frame
Enable "First Frame" from the muter node
Upload your input image
Set generation parameters:
Prompts (positive/negative)
Shift
Steps
Seed
Width / Height
Length (frame count)
Sampler
Scheduler
Click Run
🎥 To Extend or Loop an Existing Video
Enable "Video Extension" or "Video Loop" option
Upload or select your input video (via Load Image node, in animated webp format for optimal quality)
Set extension parameters:
Overlap Frame Count
Extension Frame Count
Prompts (positive/negative)
Shift
Steps
Seed
Sampler
Scheduler
Click Run
Number of Frames for Continuation (Adjust as Needed)
Overlap Frames: Number of frames carried over from the original animation.
Higher values increase temporal consistency and preserve the flow from the previous segment.
Lower values may result in more abrupt transitions or sudden changes in motion, tempo, or direction.
Extension Frames: Number of new frames to generate beyond the current animation.
⚠️ Challenges and Limitations
The base model is a T2V model, not a true I2V model.
The I2V is achieved by feeding a reference image into the VACE node, rather than directly preserving the image.
An I2V model typically keeps the input image as the exact first frame.
Here, VACE treats the image as loose guidance, not strict visual preservation
Examples:
If your source image lacks an object, but your prompt includes it, that object might be added to the first frame.
If the prompt contradicts the image, some original elements may be missing.
Fine details may degrade over time, especially in extended video generations.
📂 References
FAQ (Frequently Asked Questions)
❓ Can I run this with 16GB VRAM?
Yes. I ran it on an RTX 5060 Ti with 16GB VRAM using the Q6_K GGUF model.
With GGUF models, you can choose a version that fits your GPU memory:
Q3_X_X (3-bit) for ~8GB VRAM
Q4_X_X (4-bit) for ~12GB
Q5–Q6 for ~16GB
Q8 for ~24GB+
👉 Model & hardware info: https://huggingface.co/QuantStack/SkyReels-V2-T2V-14B-720P-VACE-GGUF
❓ Why do I get errors or bad video clips?
This workflow is still experimental, so crashes or poor results are common. Here are some tips:
OOM (out of memory) error = your GPU doesn’t have enough VRAM
Use a lower quant model (e.g. Q3 or Q4) to reduce memory usage
Lower the video resolution or clip length to avoid overload
If transitions look bad, try adjusting the prompt or other settings
Generate multiple times, then pick the best clips to stitch together
❓ Why does it give an error with certain resolutions?
The "WanVaceToVideo" node only accepts resolutions where both width and height are divisible by 16. If your input resolution doesn’t meet this requirement, you’ll likely run into errors or processing failures.
Below are safe resolutions for commonly used aspect ratios, based on standard output heights (320, 368, 480, 544, 640, 720):
✅ Recommended Aspect Ratios & Resolutions (All values divisible by 16)
🖥 32:9 -> 1136x320
📽 21:9 -> 752x320, 864×368, 1120×480, 1264×544
🖼 2:1 -> 640x320, 736×368, 960×480, 1088×544, 1280×640
📺 16:9 -> 576x320, 656×368, 832×480, 960×544, 1136×640, 1280×720
🖥 16:10 -> 512x320, 592×368, 768×480, 864×544, 1024×640, 1152×720
📷 3:2 -> 480x320, 560×368, 720×480, 816×544, 960×640, 1088×720
🖼 4:3 -> 432x320, 496×368, 640×480, 720×544, 848×640, 960×720
🖼 5:4 -> 400x320, 464×368, 608×480, 688×544, 800×640, 896×720
❓ What should I do if I get a “Request Entity Too Large” error when uploading an image?
This error typically occurs when the file size exceeds the upload limit. To work around it:
Place the WebP file directly into the
ComfyUI\inputfolder.In ComfyUI, press Reload (R) to refresh the file list.
Use the Load Image node to select the file instead of using the “Choose file to upload” option.
Description
v2.1.1
Added option to load
.safetensorsfilesAdded option to use VACE as T2V/I2V for generating the first video
Grouped and toggle-able speed optimization options (SageAttention, FP16 Accumulation, Torch Compile)
Using white mask (instead of gray) in control video to reduce color shift issue in extended video (Reddit reference)
Tested compatibility with FusionX T2V LoRA (Civitai model)
New setting to choose different reference images in video extension:
Custom reference image
First frame of source video
First frame of sliding window
Last frame of source video
Simplified and shortened the workflow and output names
Provided system prompt example
FAQ
Comments (3)
It works great with FusionX too. There is a GGUF version of FusionX with VACE baked in on huggingface. I'll share the link here in case someone wants it.
Though, if you try FusionX instead of Skyreels V2 VACE, make sure it's the one with VACE, don't load moviigen, causvid, accvideo, or mps loras (they are all aready merged in FusionX) and set the speed optimization start steps to 0 (otherwise you can get oversaturated images.
https://huggingface.co/QuantStack/Wan2.1_T2V_14B_FusionX_VACE-GGUF/tree/main
Using Q8 gguf and getting only blurred transitions when I use I2V. Any idea why?
how can I avoid the output being blurry / washed out? I am using the Q5_K_M for checkpoint. And low/high vram enabled. No sageattention as the workflow complains that there isn't a sageattention module.
