Proof of concept model: a motion lora attempt to generate a full rotation of a full body female without 'beetlejuice head', upper/lower body counter-rotation or thigh/leg/foot morphing during rotation. I think i was able to train out 'beetljuice head' and upper/lower body counter-rotation. Lower leg and foot morphing is still hit or miss.
Prompting: Should contain a moving motion to get the model to initiate turning motion. 'takes a step and turns around', 'turns in a circle', 'spins around', 'turns to the left'...
Note: wan version - if prompt contains a short top, or, short dress, model tends to 'flash' the camera.
Dataset: 5x100 frame 1024x1024 24fps video clips. each clip contained a female model performing a complete full body rotation. 3 clothed, 2 nudes. 2 models walking and turning, 3 dancing and turning.
Training: 45 epochs, 1629 steps, 288x288 resolution.
Testing: Sample videos were generated using a two stage tea cache sampler comfyui workflow. Samples with blue background were generated with caption files from training dataset. Stage 1 was run at 1.6x speed for 7 steps. This was to verify full body and rotational content before proceeding. Stage 2 was run at 1.0x speed for 20 steps. For clips that were clean, stage 2 denoising was set to 0,7. For others, stage 2 denoising was set to 0.8-0.9 to clean them up.
Description
same content as hunyuan version. trigger word 'r0und4b0ut'. strength >= 0.7 model will turnaround on its own. strength <= 0.6 model may need coaxing 'dancing, turns around while dancing, turns around' to consistently spin.
FAQ
Comments (20)
comments and suggestions for improvements are welcome.
1.3b version please! Testing this now.
i originally had wan training setup for 1.3b, have you tried this lora with 1.3b?
Nice! Any chance you could do an i2v version as well? I know t2v loras work for i2v but dedicated i2v ones do seem to perform better (for i2v).
the wan loras actually work pretty well with i2v. it's the hunyuan loras that are sucky.
@tedbiv hun i2v seems pretty bad in general so makes sense! Apparently this framepack thing includes some improvements to hun i2v, fingers crossed.
actually, about half of the examples i posted under hunyuan lora were i2v. it just needed strength turned up.
first of all i2V works perfectly well. i've got it down to 20 steps, 512x512, 81 frames in 5-7 minutes, this is on a 3090. i NEVER use t2V loras, i hate prompts. i convert all of them to i2V, and so far everything works well. bump up the strength, and maybe add a NSFW lora to help out. once i get it working, i just load up 100 or so overnight, and about 95% of them work well. and i redo those that don't. also HUNYUAN works fine, and better for me actually. its even faster, and more consistent than WAN. same parameters, when i compared them with the same inputs.
Any chance of a 1.3b version?
have you tried this with 1.3b?
I can finally respond!
Yes I did try it with wan on Pinokio.
But the error says that the LoRA is likely for a different model.
On Pinokio we have:
wan2.1_Fun_InP_1.3B_bf16 (i2v)
wan2.1_text2video_1.3B_bf16
If you can make versions for these that would open up more users than I think you know.
Keep up the good work!
hi. Using img2vid on hunyuan, I only get about 1 out of every 50-100 videos or more to rotate correctly with Lora. Is there any way to fix or correct this?
You can try increasing lora strength and or adding action word to prompt. Turning around, turning, spins, spinning, etc. I2v on Jungian is also image dependant. Some images just refuse to rotate. The San version is much better at i2v.
@tedbiv thanks xD. Ok, I'll try to increase the Lora's strength to see what happens, and I'll also add more words. I'll keep you posted. xD
@stylobcn i found normally just add 'turns around' would coax it to rotate.
@tedbiv I haven't been able to test it yet to see if it works better for me. A question: Could it make a difference if I use Model and Clip-L in GGUF or Unet, for example?
@stylobcn don't know. you can save one of the videos you like, the workflow, prompt etc should be attached. here's an example hunyuan img2vid. but as i said it is image dependent, wan version seems to perform better... https://civitai.com/images/68003963
here's another with hunyuan. https://civitai.com/images/67943831 plus i use a 2 stage process, stage 1 runs at 240x360 for 6 steps teacache x1.5 (this takes about 20-30 seconds) if it's a nogo, i terminate it. if it's good, stage 2 upscales previous latent to 480x720 for 25 steps, x1 sampling speed, this takes 5-8? minutes. that way i don't waste time on shit content.
Details
Files
r0und4b0ut-wan-v1.0.safetensors
Mirrors
1584190_r0und4b0ut-wan-v1.0.safetensors
r0und4b0ut-wan-v1.0.safetensors
r0und4b0ut-wan-v1.0.safetensors
wan_ Roundabout.safetensors
r0und4b0ut-wan-v1.0.safetensors
r0und4b0ut-wan-v1.0.safetensors
r0und4b0ut-wan-v1.0.safetensors
r0und4b0ut-wan-v1.0.safetensors
r0und4b0ut-wan-v1.0.safetensors
B64_cjB1bmQ0YjB1dC13YW4tdjEuMA.safetensors
360.safetensors