Simple WAN T2V Workflow for Self Forcing
Self Forcing trains autoregressive video diffusion models by simulating the inference process during training, performing autoregressive rollout with KV caching. It resolves the train-test distribution mismatch and enables real-time, streaming video generation on a single RTX 4090 while matching the quality of state-of-the-art diffusion models.
Update (i2v):
To use Vace, you will need to use a different checkpoint: https://huggingface.co/lym00/Wan2.1-T2V-1.3B-Self-Forcing-VACE/blob/main/Wan2.1-T2V-1.3B-Self-Forcing-DMD-VACE-FP16.safetensors
Download self_forcing_dmd.pt from https://huggingface.co/gdhe17/Self-Forcing/tree/main/checkpoints and use it as the t2v checkpoint.
Project website: https://self-forcing.github.io/
Description
FAQ
Comments (10)
Can anyone help me understand the use-case for V2V VACE?
Controlnet: if you want a character (or an object I guess) behave in a specific way, you can use a video to (try to) make the resulting video adjust to the original motion but with a different image as in the original video. for instance, if you want a character to dance in a specific way, you could record yourself dancing that way and combine an image of that character with that video of you dancing and have the character dancing as you did.
THE FUSION X LORA REALEASE AND WITH THAT WE CAN GET HIGHER QUALITY IN THIS RIGHT?
I don't think so, looks to be only 14b
CAN ALL LORA WORKS WITH THIS?
This is a 1.3B model, should work with 1.3 LORAs (all six of them).
Hope this comes as a 14B version.
This is a great tool, thank you! i2v is working with faster execution and I had no problem getting it running.
One issue is there is very little motion with some of the default settings in the workflow. These include cfg 1 and shift 8. Do you recommend these values? I am seeing better motion with cfg 4 and shift 11.
And is there any cause or any fix for how different some i2v videos come out? The vace model definitely takes a lot of liberties deviating from the input image.
Can anyone help me with where to put the .pth file? seems to work regardless. Thanks in advance
Any workflow for video masking/editing?
How to fix error"WanVaceToVideo
Calculated padded input size per channel: (0 x 104 x 60). Kernel size: (1 x 1 x 1). Kernel size can't be greater than actual input size"
