Description:
You can create a video of any length by specifying the "Length" and "Loop Count" according to the original video.
No adjustments are necessary except for the parameters in "Main Configuration" including the above.
For making this, Thanks to @BenjisAIPlayground(https://www.youtube.com/@BenjisAIPlayground) for this great opportunity.
Features (for those who want to modify it):
The first reference image (ref image) is best to use Flux Kontext or similar to match the starting frame and composition of the video. However, Wan2.1 VACE can be used without problems in my opinion because of its good tracking.
The chunk size (specified by Length) makes a difference in the number of Input and Output frames (and also in the first loop and the next and subsequent loops),I have corrected the number of frames in Output to always be "Length-1".
I found since the color of the reference image (ref image) saturates with each loop, so color matching is performed with the most recent reference image just before loop input.
Output video is assumed to be 30 fps; adjust "nth" if you are processing input video at 60 fps, for example.
Release Note:
(v3.0 -> v4.0)
Improvement (?):
When using Reference Video background/objects (using CN:DWPose), a new WanVaceToVideo for CN was set up so that CN control lines do not appear in the final output.
(Not sure if the quality is noticeably better than Ver 3.0. Waiting for feedback)
(v2.0 -> v3.0)
Improvement:
Improved output by injecting a CN (DWPose) element when using a video background (using SAM Mask).
when using a reference image background (using ControlNet), CN can be set more precisely; Depth is recommended.
In relation to above, the “Main Configuration” setting can be adjusted, including the case where the background is specified in the prompt (see Note in “How to Select#”).
Added prompt creation support function (using Ollama).
(v1.0 -> v2.0)
Improvement:
Background control is now possible.
Processing time could be reduced by selecting either image (ControlNet) or video (Masking).
Bug fix:
now includes sound in the output video.
Description
Improvements about video output:
Improved output by injecting a CN (DWPose) element when using a video background (using SAM Mask).
when using a reference image background (using ControlNet), CN can be set more precisely; Depth is recommended.
In relation to above, the “Main Configuration” setting can be adjusted, including the case where the background is specified in the prompt (see Note in “How to Select#”).
Added prompt creation support function (using Ollama).
FAQ
Comments (12)
Awesome workflow. Impressive stuff. Do you have any recommendation for reducing the tracking dots/lines from the controlnet appearing in the final output? Other than that it has been working perfectly for me. Thank you for sharing the workflow!
Looks like image blend node was the culprit. I lowered it from the original 1.00 and it got resolved. Sweet.
thank you for message. I hit the same phenomenon. Which “Image Blend” did you adjust? I would like to know the settings when it worked.
Usako_USA It was the image blend right below the controlnet depth/pose. In my situation, in the reference video I had, the person was also wearing a mask that I didn't want in the output. If I had it at full 1.00, the tracking dots would appear unless I made the output pretty low quality/short (I didn't experiment much with the threshold on that). 0.75 would fade the dots. 0.5 would remove the dots except in some circumstances where you would get a light fade, esp. if you enabled the body tracking. .3 to .5 seemed to be the sweet spot on it for consistent removal of the dots/lines. However, it still kept throwing in the mask I didn't want, and no neg prompting could get rid of it, so I put it all the way to 0 and it got what I wanted. I guess that's probably turning it off? But well, it worked in my circumstance LOL
Ideally I'd probably place it higher if I could, because I think the details are aided by it depending on the ref video, but I dunno for sure.
gumpbubba721291 Hello.
When using Reference Video background/objects (using CN:DWPose), a new WanVaceToVideo for CN was set up so that CN control lines do not appear in the final output.
(Not sure if the quality is noticeably better than Ver 3.0)
Also another observation from working with this a bit. Depending on one's needs, if one's goal is trying to get a very long video put together from a long ref video (i.e. converting 30 sec of ref video to a generated vod- anything more than that is too large for the node to input), my recommendation is getting about 124 frames at once, loop 2 on a fixed seed. I've found anything more than that, you'll start to lose substantial quality. Then add the the frames just ran to the skip_first_frames param on the Video Reference (Upload). Then download all the output, stitch it together in some sort of video software. I was running on an H100 for instance, and even though the GPU could handle just making a super long length vod, the output was this weird mesh of the ref image and vod. This is of course very dependent on one's needs.
Sometimes depending on the final frame, you also may need to back it up, like I've found if the person tilts their head on the second loop, sometimes the tracking will go uncanny.
I'm still figuring this out, but once you get it going, it's much more consistent than regular I2V, it produces higher quality results if you got the right references, and you get sound with it too! It's pretty awesome.
https://www.reddit.com/r/comfyui/comments/1m5h509/almost_done_vace_long_video_without_obvious/
Do you have a plan to apply a 'ComplyUI-SuperUltimateVaceTools' custom node to your workflow? This new node is very consistent.
Thank you! I'll look into it when I get some time.
Thank you very much, your workflow is truly amazing.
However, I am encountering a major problem. The video render degrades significantly over the loops. The rendering of the loops based on the overlape reference images loses a lot of consistency and departs completely from the first reference image. Is there a solution for this issue?
Thank you in advance for your response.
I don't have any good ideas at the moment. We have confirmed the phenomenon of color saturation when switching the reference image every loop, and we are using Color Match to correct the situation (though it is not perfect). If the reference image is not looped, the degradation will be reduced, but it is expected that the tracking of motion will be degraded. We are investigating (we are currently working on a Wan2.2 version of this and if we can improve it, we will release it there).
Thanks for your comments, and I'll keep you posted.
Thank you for the answer. Let's hope you will find a fix. Cant wait for the Wan2.2 version. Keep the great work. Cheers!
Greak workflow. Sofisticated but at the same time very easy to use. Thanks°

