This workflow allows for a more controlled v2v by considering a 1f reference image.
Description
FAQ
Comments (15)
Getting "shape '[1, 3, 4, 72, 96]' is invalid for input of size 103680"
Not sure what's I'm doing incorrectly. I'm seemingly using all the same models/text encoders. Only difference is the video clips, and the resolution is still divisible by 16. Any ideas?
@hakoniwa No, I was using umt5-xxl-enc-bf16.safetensors, as that was what autofilled on the example workflow. What models should I be using? Is this the cause of my errors?
@HotHams Did you try to change frame_load_cap in Load Video (Upload). I know sounds crazy but it works for me. For example 48 fps gives me error but 49 fps works fine
@Jankolonko Yes, the frame load cap seemed to be it. It seems somewhat picky about the multiples of the input frames, but ones that's resolved, the workflow works wonderfully! Thank you for noting this!
Who need to 3D texture anymore - just animate some block shapes and slap that image reference in. This is insanely good
Can't thank you enough for this!!
by default u have the image going into the vae on the wanvideoinmagetovideo encode node which is not possible if i was connect them manually and makes no sense and you have the image_embeds going into the start iamage of this same node and the clip embeds going into the vae of the wan video vae loader - it wasnt working like this so i change them around which worked but it didnt produce the greatest result
Do you support multiple roles in a scene?
Is it possible to run it with 16GB VRAM? With default model i got stuck waiting for render when my VRAM its almost full (15,5GB) after 1 hour nothing happened :(
Sampling 49 frames at 720x480 with 24 steps
0%| | 0/25 [00:00<?, ?it/s]
You have to use the GGUF version of Wan2.1, otherwise it's slightly too high memory usage for 16GB
4070ti, vram12gb, it would take over 3hrs when to do example
Is it possible to change how much the output is determined by the still image style?


