Wan2.1 SkyReelsV2 VACE workflow tested with MoviiGen AccVid CausVid FusionX LoRAs (14B T2V, Reference I2V, Extend & Loop)
This ComfyUI workflow supports:
Image-to-Video (I2V) and Loopable Video Extension (V2V) generation using SkyReels-V2-VACE-GGUF
Using AccVid and CausVid LoRAs with 2 samplers for faster generations
Using MoviiGen and Rewards LoRAs for better performance
You can:
Generate the first video as your starting point
Extend the video one at a time to gradually build out the full sequence
Cherry-pick the best segments for your final cut
Refine prompts step-by-step as the scene or motion evolves
🔧 Components
🌀 SkyReels-V2-VACE-GGUF (by wsbagnsv1)
Based on Wan2.1, fine-tuned in 720p@24fps videos
Integrated VACE (All-in-One Video Creation and Editing framework) allows motion control using reference videos (like ControlNet for video)
Native support in ComfyUI via GGUF format
Temporal consistency across the full sequence
⚡ LoRA Models
Speed
CausVid v2, https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32_v2.safetensors
Quality
All-in-one
FusionX, https://civarchive.com/models/1678575 (T2V for VACE)
▶️ How to Use
🖼️ To Generate Video from an Image as First Frame
Enable "First Frame" from the muter node
Upload your input image
Set generation parameters:
Prompts (positive/negative)
Shift
Steps
Seed
Width / Height
Length (frame count)
Sampler
Scheduler
Click Run
🎥 To Extend or Loop an Existing Video
Enable "Video Extension" or "Video Loop" option
Upload or select your input video (via Load Image node, in animated webp format for optimal quality)
Set extension parameters:
Overlap Frame Count
Extension Frame Count
Prompts (positive/negative)
Shift
Steps
Seed
Sampler
Scheduler
Click Run
Number of Frames for Continuation (Adjust as Needed)
Overlap Frames: Number of frames carried over from the original animation.
Higher values increase temporal consistency and preserve the flow from the previous segment.
Lower values may result in more abrupt transitions or sudden changes in motion, tempo, or direction.
Extension Frames: Number of new frames to generate beyond the current animation.
⚠️ Challenges and Limitations
The base model is a T2V model, not a true I2V model.
The I2V is achieved by feeding a reference image into the VACE node, rather than directly preserving the image.
An I2V model typically keeps the input image as the exact first frame.
Here, VACE treats the image as loose guidance, not strict visual preservation
Examples:
If your source image lacks an object, but your prompt includes it, that object might be added to the first frame.
If the prompt contradicts the image, some original elements may be missing.
Fine details may degrade over time, especially in extended video generations.
📂 References
FAQ (Frequently Asked Questions)
❓ Can I run this with 16GB VRAM?
Yes. I ran it on an RTX 5060 Ti with 16GB VRAM using the Q6_K GGUF model.
With GGUF models, you can choose a version that fits your GPU memory:
Q3_X_X (3-bit) for ~8GB VRAM
Q4_X_X (4-bit) for ~12GB
Q5–Q6 for ~16GB
Q8 for ~24GB+
👉 Model & hardware info: https://huggingface.co/QuantStack/SkyReels-V2-T2V-14B-720P-VACE-GGUF
❓ Why do I get errors or bad video clips?
This workflow is still experimental, so crashes or poor results are common. Here are some tips:
OOM (out of memory) error = your GPU doesn’t have enough VRAM
Use a lower quant model (e.g. Q3 or Q4) to reduce memory usage
Lower the video resolution or clip length to avoid overload
If transitions look bad, try adjusting the prompt or other settings
Generate multiple times, then pick the best clips to stitch together
❓ Why does it give an error with certain resolutions?
The "WanVaceToVideo" node only accepts resolutions where both width and height are divisible by 16. If your input resolution doesn’t meet this requirement, you’ll likely run into errors or processing failures.
Below are safe resolutions for commonly used aspect ratios, based on standard output heights (320, 368, 480, 544, 640, 720):
✅ Recommended Aspect Ratios & Resolutions (All values divisible by 16)
🖥 32:9 -> 1136x320
📽 21:9 -> 752x320, 864×368, 1120×480, 1264×544
🖼 2:1 -> 640x320, 736×368, 960×480, 1088×544, 1280×640
📺 16:9 -> 576x320, 656×368, 832×480, 960×544, 1136×640, 1280×720
🖥 16:10 -> 512x320, 592×368, 768×480, 864×544, 1024×640, 1152×720
📷 3:2 -> 480x320, 560×368, 720×480, 816×544, 960×640, 1088×720
🖼 4:3 -> 432x320, 496×368, 640×480, 720×544, 848×640, 960×720
🖼 5:4 -> 400x320, 464×368, 608×480, 688×544, 800×640, 896×720
❓ What should I do if I get a “Request Entity Too Large” error when uploading an image?
This error typically occurs when the file size exceeds the upload limit. To work around it:
Place the WebP file directly into the
ComfyUI\inputfolder.In ComfyUI, press Reload (R) to refresh the file list.
Use the Load Image node to select the file instead of using the “Choose file to upload” option.
Description
FAQ
Comments (13)
So, can 16G video memory run this workflow?
Yes, 16GB of VRAM is sufficient. I ran this workflow on an RTX 5060 Ti with 16GB VRAM using the Q6_K GGUF model.
With GGUF models, you can choose the quantization level that fits your hardware:
Q3_X_X (3-bit) for ~8GB VRAM
Q4_X_X (4-bit) for ~12GB
Q5–Q6 for 16GB
Q8 for 24GB+
You can check out the model and hardware compatibility chart here:
👉 https://huggingface.co/QuantStack/MoviiGen1.1-VACE-GGUF
(The compatibility table is on the right panel)
@lym0 This is a great work, I decided to download and test it, thanks for your reply
This works really well. One modification I made that might be useful to others is to get the FPS from the video info, and send it out to the output nodes. WAN is trained on 16FPS, they're all 24 by default. Could also just set them manually if all you work with is WAN content. Added an interpolator as well at the end.
I especially like your solution to Caus vid issues by splitting the sigmas. I might modify this into a regular generator for getting the first video.
Thanks for the thoughtful feedback.
The workflow’s now updated to v1.1.1, which sets the FPS up front. I usually handle frame interpolation and upscaling outside of ComfyUI (Topaz Video AI), but it’s great to see how you’re integrating it directly.
Always interesting to see how others are customizing their pipelines.
wow, color completely changes in the extended video. anythhng you can do to fix this?
You're right, the color shift in the extended video is noticeable. One thing we can try is using color correction nodes, https://www.reddit.com/r/comfyui/comments/1gq4baf/colour_correction_with_comfyui/.
Changing the video format or CRF settings of the Video Combine node might help too. I did a quick test and found that H.264/H.265/webm have the color shift issue, but the GIF format does not, i.e. video/ffmpeg-gif with sierra2_4a works best for avoiding the color shift issue.
Updates: Tested the GIF workflow (v1.1.0), it reduces color shift issues, but introduces dithering artifacts. A better solution might be using the lossless animated WebP format. I.e., save as image/webp using the Video Combine node with the lossless option enabled. Then, use the native Load Image node to load the animated WebP (it can load image sequences from an animated WebP)
Quick update: the revised workflow (v1.1.1) now uses lossless animated WebP in the video extension for optimal quality.
Note: If you run into a "Request Entity Too Large" error when uploading, just place the WebP file directly into the ComfyUI\input folder, hit reload (R), and select it using the Load Image node instead of using the "choose file to upload" option.
@lym0 awesome, will check it out
The color shift issue has also been discussed here:
- GitHub: https://github.com/ali-vilab/VACE/issues/44
- Reddit: https://www.reddit.com/r/StableDiffusion/comments/1ktljys/comment/muc8e0r/
- Reddit: https://www.reddit.com/r/StableDiffusion/comments/1l68kzd/video_extension_research/
Yeah seeing the color issue here too. For me it looks like it ups the contrast each extension. The reddit links above were a good insight, looks like people have tried everything.
Edit: You can see the colorshift twice (3 gens chained together here): https://civitai.com/posts/19368070
Is this similar to what others are seeing?

