This lora is for high kick videos. It can even destroy things! The complexity of the moves is higher than in my other models, that's why it might be a bit instable here and there. But i usually get something good in about 1-3 generations.
Used I2V-14B-720P as base.
Good Prompts:
A brunette she is doing a fast high kick with her left leg, she fights against the pillar, the pillar is breaking into large pieces after it was hit
A brunette she is doing a fast high kick series of two kicks with her leg, she fights against an enemy, she wears black stockings and blue high heels
Training Workflow:
I have used 8 videos with about 1 second length, of males and females, with faces mostly cut off. I have described what happens e.g.
Here some captions:
A black haired girl, in a boxing gym in an old warehouse, she is doing a fast high kick series with her right leg against a black punching bag, the punching bag absorbs the force, she wears a white hot pants and a white sports top with arm cuffs and red high heels, her face is cut off
A mma fighter, in the street, he is doing a high kick with his right leg against some stones, the stones explode into small pieces after they were hit, he wears a white shorts and a black sports jacket, his face is cut off, bad video quality
Then I used Musubi-Tuner with single images and whole sequences setting and trained about 64 epochs, with my 4090 it was done in about 1h 35 min.
That's all :)
Description
FAQ
Comments (31)
Can you show me your workflow? It breaks my face.
Check this tutorial:
https://civitai.com/articles/11942/training-a-wan-or-hunyuan-lora-the-right-way
@ai_build_art thank you! but I meant that generating with your Laura breaks your face. That's why I asked for a workflow in which you make your pictures.
@_RUST_ Ah ok i might upload my workflow later on but you can have a look on, which might be good : https://civitai.com/models/1230250?modelVersionId=1586885
@ai_build_art thank you! If it's not difficult for you, I'll be waiting!
"Then I used Musubi-Tuner with single images"
So you are using both 1 second clips and single images for training?
What do you put for single images? And how do you describe motions that happen with just one frame?
When you say "face cut off", did you just blur out the faces?
Good work! Rare gem!
This is my Toml file that i have used:
resolution = [256, 256]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
video_directory = "E:/Lora/Musubi-Tuner/TrainingData/KickSmallParts/Medium"
cache_directory = "E:/Lora/Musubi-Tuner/TrainingData/Cache/KickSmallParts/Medium_1" # recommended to set cache directory
target_frames = [1]
frame_sample = 5
frame_extraction = "uniform"
[[datasets]]
video_directory = "E:/Lora/Musubi-Tuner/TrainingData/KickSmallParts/Medium"
cache_directory = "E:/Lora/Musubi-Tuner/TrainingData/Cache/KickSmallParts/Medium_full" # recommended to set cache directory
frame_extraction = "full"
max_frames = 45
# other datasets can be added here. each dataset can have different configurations
I just cut them with a video editor. See this tutorial:
https://civitai.com/articles/11942/training-a-wan-or-hunyuan-lora-the-right-way
@ai_build_art Duddddddddddde! That article section on how to caption and training loss calculation explanation is good, I haven't seen anybody explaining it properly.
Hell yeah!
Love this so much, we definitely need a whole library of Loras focused on verbs and actions
yes definitely, with I2V we should focus more on concepts than on persons.
Curiously, this model compresses really well for wan (>99% retention, 30mb). I'm getting the impression wan understands (particle) physics really well naturally and just needs a bit of coaxing for specifics.
Yes i feel it just gets remembered of what it already learned
@firemanbrakeneck Mind sharing your compression command / settings? I'm constantly re-compressing LoRA down to 30-50 MB, but haven't tried with Wan yet.
My typical settings (kohya-ss/sd-scripts)
./sd-scripts/networks/resize_lora.py \
--save_precision fp16 \
--new_rank 8 \
--dynamic_method sv_fro \
--dynamic_param 0.99
@ironbook25531 I have an article up with details on various models, but in general I go with one of three presets depending on importance / content variance from source:
Low = rank 8, param 0.90 . Mid = 24 / 0.92 . High = 32 / 0.94 .
Those numbers you're getting are consistent with my low setting - this is because rank caps the width of each layer; you're losing a lot of good weights on the layers that have the most information because of it. You're probably not even close to saturating that 0.99 on almost any layer. That dynamic param is what should be doing the work, the rank cap is just there to prevent wasting space on single particularly active layers.
In some models that's okay, if you don't mind small details even 80-85% retention can be visually similar. But I don't think you're getting the best quality / size ratio with your current settings.
Oh, and I use fp16 too, still paranoid about different quants being supported everywhere (though it's probably a thing of the past by now).
@firemanbrakeneck Thanks, I'll take a look.
@firemanbrakeneck I'm curious as to what you're using to compress Wan and Hunyuan LoRa, if you wouldn't mind clarifying. I took at look at the article and tried the recommended batch file in case it was different than what I have, but every motion lora I've tried to resize gives:
scale = network_alpha/network_dim
TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'.
Are you using something other than sd-scripts to resize / compress them?
@ironbook25531 This PR should allow resizing any wan (or other) models publicly available: https://github.com/kohya-ss/sd-scripts/pull/2057
nice!!!!!!! more kicking lora!
The sample videos look nice.
genius
nice!
surprisingly awesome
Women in 2025, more dangerous than ever! :-O
This lora is amazing!!! I wish fighting games looked like this. This is so much better than ANY fighting game out there. I fucking love this Lora. This is fantastic!
I'm new to all this but you start to realize the inability to do violence is a major hindrance to creating any sort of action sequences to tie some sort of large scale film together. With the ability to make videos from audio and the model sizes shrinking to be more usable it's just about to that point of extreme access and taking this further than just single clips. To convince VIDU to chop the head off a massive snake in a cave I had to use Hypic and hand draw the art and coax it saying its a demon and evil please kill it don't be a pussy either and finally she leaps and chops the head to match my final picture of a bloody snake head. Since you know how to do these things you should totally get Fists of Fury 1971 with Bruce Lee and train every fight and make the most violent Lora out there. It's a free stream and with a down-loader it essentially a free movie and has classic Bruce Lee fighting 20 guys and doing some serious real world ass kicking moves that you just don't see anymore. plus a Nun Chuck Lora with his skill and he is busting peoples faces with these things. . https://www.dailymotion.com/video/x9gdfne 17 Minutes into the movie is when shit gets nuts. I haven't seen it since high school but just re-watched the fights and this shit would be awesome to type spin kick a guy then repeatedly pound his assailants face and watch the character of your choice pull off Bruce Lee level speed and agility.
please do sword swing lora
The base model just suck about it
Baddie girl needs this
I get the same hash for your LoRA and "Various kicks - XL". Your LoRA seems older than the other one.
hard to control,but overall working.