Some say I'm a man on a mission... but today they'd be missing the last three letters of that term, so I present to you...
Missionary
Trained on ~10 clips, 24fps, 48 frames, 288x288, 224x400 and 400x224 with segmented and strategic captioning (simplified and ordered). I'm quite happy with the results; I feel like my knowledge and experience have reached a point where I can collect, caption and train these more efficiently.
Wildcard Prompt
Overhead view of a beautiful {Russian|French|Swedish|Swiss|Latina|Austrian|Dutch|English|America|Californian|Siberian|African} woman having missionary sex with a{|n African} man.
She is lying on her back on {a bed|the floor|a bench|an ottoman|a couch|a table} #as he thrusts his {giant|huge|thick|large|small} penis in and out of her vagina at the bottom of the frame.
She has {long|shoulder length} {blonde|brown|dirty blonde|black|auburn} hair.
She is {moaning from pleasure|screaming from pleasure|smiling and blushing|very surprised|throwing her head back|furrowing her brow and gritting her teeth|grinning}.
She is {naked with {large|medium|small} breasts|wearing a {pink|red|white|black|blue|green} bra|wearing a {pink|red|white|black|blue|green|flower pattern|Hawaiian|cute|stylish} shirt}.
Her arms are {at her sides with her hands hidden behind her thighs|up above her head|bent at the elbows with her hands in her hair|bent with her hands caressing her own chest}.
His mostly hidden {enormous|giant|huge|large|massive|thick|big} penis slides in and out of her vagina at the bottom of the frame. Strings of fluid connect their crotches.
Lighting is {bright and even|soft|candle lit|dim in the background with a spotlight on the subject}.Two.WAN
Easily twice as good as One. I fixed some lighting issues and boosted the resolution with multibuckets!
One.WAN
In the order of my least complicated sets, as I re-realize how far behind I am with WAN, I reshuffled this set, added some new clips and trained with higher resolutions.
My last LORA was well done at 28 epochs, this one took 38. Why? No idea.
Pair this one with the Orgasm LORA for even more fun!
Training Notes
Nothing new really. My training strategy has stabilized. I choose 8-15 clips of a concept, normalize FPS, length, Aspect Ratios and focus on clean footage. High quality, no weird poses with fingers/hands/body parts distracting from the main action, minimizing tattoos or weird clothing, blurring faces... Captions start with the subject, then the action, then the second subject. Then we move to the second layer (surface of peculiarities of the pose), then traits, then cropping, then the lighting
Cool down
Summer is heating up and I am slowing down. Aside from being busier and spending more time away from my screen, the investment of time and energy isn't as easy to justify these days. I'm likely to complete the most popular positions and then who knows, maybe some other model will come around or I'll bite the bullet and do WAN!
Disclaimer
Do no harm.
Description
FAQ
Comments (16)
Is there a reason you are intentionally training on 24fps?
... also using 48 frames means you are losing some frames, because it needs to be a multiple of 4 + 1. So you can use 41 or 45 or 49... if you use 48 frame videos, what are you actually training on? 45 frames? What's in your toml?
I train on 24FPS for HunyuanVideo and 16FPS for WAN. Those are their native frames per second afaik
@Gongoloid The software does "crop" to frames counts it likes... It also wants all resolutions to be divisible by 32, but I haven't been religious about that either. diffusion-pipe is pretty self-correcting for things like this :)
The architecture of the model requires the multiples of 4 +1... sorry I was misunderstanding and thought you were training Wan 2.2 on 24fps for some reason. As for resolutions, using buckets means you don't ever have to think about that at all. I let the kohya scripts process data for me and it's fine... but for duration I am not sure I understand your context. How are you ending up with videos that are 48 frames in the first place? I guess I don't understand that part... you obviously used some software to collect and process your clips, and if they are all 48, can they not be all 49 or 45 next time? My issue is that if I clip 48 frames... I'm literally wasting 3 frames per video.
I use vidtrainprep to cut precise duration crops. Check it out if you haven't. You set the duration in frames and can loop your ranges and crop with the mouse on top of the looping video. Very handy and I don't think that functionality exists anywhere else.
https://github.com/lovisdotio/VidTrainPrep
Musubi is my jam... I struggled with DP back around Christmas with HY... and got no help and learned nothing. Musubi just worked for me. Then with Wan 2.2, training a single LoRA in dual mode has been tits so I don't need DP, although I did try a single run in August on a linux box and it... uhhh, went very poorly. I trained a facial likeness LoRA yesterday on my 3090 in 38 minutes and it works perfectly. I trained a video+image LoRA with it on a 3060 yesterday in under 2 hours, and it works great.
@Gongoloid musubi and the fork blissful-tuner are cool, Blyss actually helped me get that up and running! Unfortunately I had lots and lots of trouble with instability, probably due to my own environment in the end, and ended up going back to DP which I have become very comfortable with.
I use avidemux, an open source simple video editor, to set slice points, resize, crop, etc... It's a bit manual but pretty straightforward.
I used to clip 4 seconds of video for HunyuanVideo, since I couldn't process more frames than that anyway, but now for WAN I'm leaning towards 6-8 seconds per clip clipped. I'm doing that also because sometimes I found that another couple frames would have been SO nice to have, but god knows where the original file was now, ouf...
Then the way DP works is you set "single_beginning" and it takes the frames you specify from the beginning of the file in a linear fashion. in this sense, all of my videos exceed the frame count, but not all frames are used. DP will start by creating latent cache, where it resizes and transmutes all clips according to the buckets, so you don't need to feed it exact resolutions or lengths. Sounds like you're aware of that though :)
@az420 - https://pixeldrain.com/u/Suq68vm9
The thing about musubi is that dual mode... is just superior. One LoRA for high and low, trained in one session with both bases available at the same time... it's vastly more efficient in several blatantly obvious ways. I am not trying to "convert" you or anything, but I've trained a couple hundred of these now and see no reason to train two separate LoRAs at all, and my testing shows that training a dual mode LoRA with conservative data and settings is possible even on a 3060 12gb card with only 32gb RAM. It's pretty nifty to train high noise and low noise into one file and load that same single file into both paths and get motion and details from the same LoRA.
You can use ShareX to record your screen at 16fps.... so virtually any local media and any media you can stream anywhere is fair game. I record chunks and load them into VTP and cut clips and it is faster by far than any other workflow. I never enjoyed using avidemux though out of all my choices. ShareX is free and open source and very capable and comes in portable flavor.
https://github.com/ShareX/ShareX
Used to convert everything to 16fps then edit, now I just use ShareX and clip with VTP and I'm done. VTP can caption as well.
@Gongoloid And it works just as well as training a high and low one separately? Sounds too good to be true! I may reach out for more details if you don't mind
@az420 - it's musubi-tuner, and it's based on kohya scripts... I did not conceive of the concept of mixture of experts, but training a LoRA in dual mode is the expression of the idea perfectly. I did not write the scripts. Training a LoRA for both high and low has been a viable idea since before the models were published. Kohya implementing it was just a natural progression of things based on the papers... it is not complex or difficult. It is actually far simpler and very easy to do.
There is no logical reason low and high timesteps can not reside in a single file.
I will provide anyone with any information they ask for, including my actual files and training data.
Feel free to message me.
Will this work for man on man?
Could you share the workflow you use for this? (Not sure if it's included in the video, I can't get Comfy to pull workflows from video files)
something like this https://civitai.com/models/2087071/az42up-wan-cowgirl