CivArchive
    Wan2.1-VACE GGUF Workflow for "Free Length" ref2video using For Loop (version 4) - v3.0
    Preview undefined
    Preview undefined

    Description:

    You can create a video of any length by specifying the "Length" and "Loop Count" according to the original video.

    No adjustments are necessary except for the parameters in "Main Configuration" including the above.

    For making this, Thanks to @BenjisAIPlayground(https://www.youtube.com/@BenjisAIPlayground) for this great opportunity.

    Features (for those who want to modify it):

    1. The first reference image (ref image) is best to use Flux Kontext or similar to match the starting frame and composition of the video. However, Wan2.1 VACE can be used without problems in my opinion because of its good tracking.

    2. The chunk size (specified by Length) makes a difference in the number of Input and Output frames (and also in the first loop and the next and subsequent loops),I have corrected the number of frames in Output to always be "Length-1".

    3. I found since the color of the reference image (ref image) saturates with each loop, so color matching is performed with the most recent reference image just before loop input.

    4. Output video is assumed to be 30 fps; adjust "nth" if you are processing input video at 60 fps, for example.


    Release Note:


    (v3.0 -> v4.0)

    Improvement (?):

    1. When using Reference Video background/objects (using CN:DWPose), a new WanVaceToVideo for CN was set up so that CN control lines do not appear in the final output.

      (Not sure if the quality is noticeably better than Ver 3.0. Waiting for feedback)


    (v2.0 -> v3.0)

    Improvement:

    1. Improved output by injecting a CN (DWPose) element when using a video background (using SAM Mask).

    2. when using a reference image background (using ControlNet), CN can be set more precisely; Depth is recommended.

    3. In relation to above, the “Main Configuration” setting can be adjusted, including the case where the background is specified in the prompt (see Note in “How to Select#”).

    4. Added prompt creation support function (using Ollama).


    (v1.0 -> v2.0)

    Improvement:

    1. Background control is now possible.

    2. Processing time could be reduced by selecting either image (ControlNet) or video (Masking).

    Bug fix:

    1. now includes sound in the output video.

    Description

    Improvements about video output:

    1. Improved output by injecting a CN (DWPose) element when using a video background (using SAM Mask).

    2. when using a reference image background (using ControlNet), CN can be set more precisely; Depth is recommended.

    3. In relation to above, the “Main Configuration” setting can be adjusted, including the case where the background is specified in the prompt (see Note in “How to Select#”).

    4. Added prompt creation support function (using Ollama).

    FAQ

    Comments (12)

    gumpbubba721291Jul 20, 2025· 1 reaction
    CivitAI

    Awesome workflow. Impressive stuff. Do you have any recommendation for reducing the tracking dots/lines from the controlnet appearing in the final output? Other than that it has been working perfectly for me. Thank you for sharing the workflow!

    gumpbubba721291Jul 20, 2025

    Looks like image blend node was the culprit. I lowered it from the original 1.00 and it got resolved. Sweet.

    Usako_USA
    Author
    Jul 21, 2025

    thank you for message. I hit the same phenomenon. Which “Image Blend” did you adjust? I would like to know the settings when it worked.

    gumpbubba721291Jul 21, 2025

    Usako_USA It was the image blend right below the controlnet depth/pose. In my situation, in the reference video I had, the person was also wearing a mask that I didn't want in the output. If I had it at full 1.00, the tracking dots would appear unless I made the output pretty low quality/short (I didn't experiment much with the threshold on that). 0.75 would fade the dots. 0.5 would remove the dots except in some circumstances where you would get a light fade, esp. if you enabled the body tracking. .3 to .5 seemed to be the sweet spot on it for consistent removal of the dots/lines. However, it still kept throwing in the mask I didn't want, and no neg prompting could get rid of it, so I put it all the way to 0 and it got what I wanted. I guess that's probably turning it off? But well, it worked in my circumstance LOL

    Ideally I'd probably place it higher if I could, because I think the details are aided by it depending on the ref video, but I dunno for sure.

    Usako_USA
    Author
    Jul 21, 2025

    gumpbubba721291 Hello.
    When using Reference Video background/objects (using CN:DWPose), a new WanVaceToVideo for CN was set up so that CN control lines do not appear in the final output.

    (Not sure if the quality is noticeably better than Ver 3.0)

    gumpbubba721291Jul 21, 2025· 1 reaction
    CivitAI

    Also another observation from working with this a bit. Depending on one's needs, if one's goal is trying to get a very long video put together from a long ref video (i.e. converting 30 sec of ref video to a generated vod- anything more than that is too large for the node to input), my recommendation is getting about 124 frames at once, loop 2 on a fixed seed. I've found anything more than that, you'll start to lose substantial quality. Then add the the frames just ran to the skip_first_frames param on the Video Reference (Upload). Then download all the output, stitch it together in some sort of video software. I was running on an H100 for instance, and even though the GPU could handle just making a super long length vod, the output was this weird mesh of the ref image and vod. This is of course very dependent on one's needs.

    Sometimes depending on the final frame, you also may need to back it up, like I've found if the person tilts their head on the second loop, sometimes the tracking will go uncanny.

    I'm still figuring this out, but once you get it going, it's much more consistent than regular I2V, it produces higher quality results if you got the right references, and you get sound with it too! It's pretty awesome.

    flusJul 23, 2025
    CivitAI

    https://www.reddit.com/r/comfyui/comments/1m5h509/almost_done_vace_long_video_without_obvious/

    Do you have a plan to apply a 'ComplyUI-SuperUltimateVaceTools' custom node to your workflow? This new node is very consistent.

    Usako_USA
    Author
    Jul 23, 2025

    Thank you! I'll look into it when I get some time.

    romquenin541Aug 6, 2025
    CivitAI

    Thank you very much, your workflow is truly amazing.

    However, I am encountering a major problem. The video render degrades significantly over the loops. The rendering of the loops based on the overlape reference images loses a lot of consistency and departs completely from the first reference image. Is there a solution for this issue?

    Thank you in advance for your response.

    Usako_USA
    Author
    Aug 6, 2025

    I don't have any good ideas at the moment. We have confirmed the phenomenon of color saturation when switching the reference image every loop, and we are using Color Match to correct the situation (though it is not perfect). If the reference image is not looped, the degradation will be reduced, but it is expected that the tracking of motion will be degraded. We are investigating (we are currently working on a Wan2.2 version of this and if we can improve it, we will release it there).

    Thanks for your comments, and I'll keep you posted.

    romquenin541Aug 6, 2025

    Thank you for the answer. Let's hope you will find a fix. Cant wait for the Wan2.2 version. Keep the great work. Cheers!

    10058234Nov 7, 2025
    CivitAI

    Greak workflow. Sofisticated but at the same time very easy to use. Thanks°

    Workflows
    Wan Video 14B t2v

    Details

    Downloads
    82
    Platform
    CivitAI
    Platform Status
    Available
    Created
    7/20/2025
    Updated
    5/14/2026
    Deleted
    -

    Files

    wan21VACEGGUFWorkflowForFree_v30.zip

    Mirrors