Prompt:
pbukkake, woman with mouth open looking at camera, she keeps her head still as a nude standing man holding his penis enters from the side. The man moves his hand back and forth repeatedly over his penis. His penis cums inside of her mouth.All prompts were identical to this above. I threw in some additional clarification and does things you would expect using natural language.
Trained, tested, and generated all within Musubi-Trainer. If it doesn't work in ComfyUI talk to that developer.
Training notes:
I followed a guide here which was entirely correct. After many failures this is the first Hunyuan Video model I was satisfied with that was based on video clips.
There's some nonsense about how prompts are made, but throwing in quality and lighting absolutely does help in Hunyuan. Just describe the scene without any Lora to get close to what you want, and when you get that phrase, use it to train on that phrase.
Training was done on about 30 clips. All set to 5 seconds total and 16 FPS. Resolution was 480 x 272.
learning_rate 5e-4
network_dim 16
discrete_flow_shift 3.0
steps: ~3000
Took about 8 hours to train on a 4090.
Description
FAQ
Comments (17)
Awesome, always love a new HY LoRA. But now you have the data and you know how to use musubi, training for Wan 2.2 is virtually the same.
I'm a bit concerned that you trained HY at 16fps, though. Was there any reason for this? Hunyuan is a 24fps model.
Great question. I tried this on other settings but it came out bad. Still images with little movement and various AI horrors despite using more sensible settings like 16:9 dimensions.
All of this comes specifically from a comment or an article on here that I frankly forgot where I found it. Here is a truncated version of that same piece of advice:
"Ya'll I've been having EXCEPTIONAL results training Wan with the following config:
LR 2e-5 with LoraPlus of 4
Network dimension 16 Network Alpha 16
came_pytorch optimizer with default args of weight_decay=0.01,eps=(1e-30, 1e-16),betas=(0.9, 0.999, 0.9999)
Timestep sampling: Shift Discrete Flow Shift 3.0
2400 total steps on T2V 14B FP16 base model in FP8 scaled precision
And for my dataset.toml:
[general]
caption_extension = ".txt"
enable_bucket = true
bucket_no_upscale = false
batch_size = 1
[[datasets]]
video_directory = "/path/to/dataset/loraname"
cache_directory = "/path/to/dataset/loraname/cache0"
resolution = [480, 272] # 130,560 pixels
target_frames = [65] # 8.5 megaframepixels
frame_extraction = "head"
Videos preprocessed into ~5 seconds clips showing the subject of interest at 16fps. This has produced such stellar results that I've gone back and retrained all my Wan models with these settings! You can get away with just the low res 480x272 bucket but it will reflect in the quality of the learned material. Including the higher res shorter clips allows showing the detail while including the lower res longer clips allows showing the progression of an action or scene."
Since WAN and Hunyuan have similar architecture, training techniques should be similar. I ignored the part about the high def version and just did low def. The results kind of speak for themselves.
I don't think I'll ever train a WAN model, my ability to use WAN has been broken for a few months since I changed configurations.
I exclusively use(d) WAN 2.1 for i2v. This was always super slow compared to HunYuan. 1 WAN 2.1 video generated is about 10 minutes. Where in HunYuan I can generate 3 videos in the same time. Also, WAN has a very high failure rate where the output is completely unusable and the LoRAs have to be cranked to very high values to work, which lowers the quality.
Although, if 2.2 is good enough, I suppose this could work as an i2v... That's also supposed to be similar.
@AlbanianDoorknob - I have not tested your LoRa yet, but Wan and Hunyuan do not have similar architectures, really. Wan was trained ion 16fps videos, but Hunyuan was trained on 24fps videos.
I trained a couple hundred Hunyuan LoRAs with musubi on a 3060 12gb card. They are great LoRAs, and HY is awesome.
But Wan is better in every respect, especially 2.2. It gens faster and the results are higher quality. It trains WAY faster and the results are of far higher quality. Hunyuan is finicky and epochs vary greatly. Overtrained checkpoints are stiff and janky but still produce crisp outputs with great likeness to characters. With Wan 2.2 determining which checkpoint is good has been a breeze.
I do most of my training on low spec cards, and training Wan 2.2 with musubi in dual mode has been a dream.
If you have data already, it only takes a few hours to test it out, that's all I'm saying.
Thanks for the LoRA!
@leisure_suit_larry is this for photorealistic NSFW material? Because I've been avoiding wan simply because all the output I see off it doesn't look real at all like hunyuan, I suspect the training dataset had too much anime and deviant art.
@frosty639 -I do not generate anime, only photorealistic videos of human subjects. I have only trained a couple of motion loras, and one of an animated movie character, and the rest of several dozen are all humans. Human likeness LoRAs take just a few hours on my 3060 12gb cards. 35 images is all it takes.
As for whether or not Wan 2.2 does realism... just look at the LoRAs on civit. It is the most realistic model available IMO.
it works in comfyui
Does not work for Hunyuanvideo ,is this for WAN?
Nope. I can't make anything for WAN.
@AlbanianDoorknob Wel does not work i also get out of range error
@erik28173 I don't think I can actually help due to how much of a mess the whole video generation space is right now with multiple different versions.
It works for me and others so fuck if I know.
If you're using ComfyUI, I can't help.
If you're using Musubi-Tuner, try version 0.2.7 or one of the forks like Blissful.
Does not seem to work in framepack studio in Pinokio
I get an "Index is out of range" error.
No clue why that would be. Maybe update your Python environment?
Very good LORA,Can add swallowing scenes?
nice job! looking forward to seeing what you train next!
Does not work with Framepack Studio.
Terminal:
Loading LoRA from 'pbukkakeHV.safetensors'...
Traceback (most recent call last):
File "C:\pinokio\api\fp-studio.git\app\modules\pipelines\worker.py", line 571, in worker
studio_module.current_generator.load_loras(selected_loras, lora_folder_from_settings, lora_loaded_names, lora_values)
File "C:\pinokio\api\fp-studio.git\app\modules\generators\base_generator.py", line 261, in load_loras
self.transformer, adapter_name = lora_utils.load_lora(self.transformer, lora_dir, lora_file)
File "C:\pinokio\api\fp-studio.git\app\diffusers_helper\lora_utils.py", line 40, in load_lora
state_dict = converthunyuan_video_lora_to_diffusers(state_dict)
File "C:\pinokio\api\fp-studio.git\app\env\lib\site-packages\diffusers\loaders\lora_conversion_utils.py", line 1480, in converthunyuan_video_lora_to_diffusers
handler_fn_inplace(key, converted_state_dict)
File "C:\pinokio\api\fp-studio.git\app\env\lib\site-packages\diffusers\loaders\lora_conversion_utils.py", line 1350, in remap_img_attn_qkv_
to_q, to_k, to_v = weight.chunk(3, dim=0)
RuntimeError: chunk expects at least a 1-dimensional tensor