Wan 2.2:
Download both the highnoise and lownoise loras, and plug them both into your workflow. The "training images" download contains the example I used for the previews.
T2V notes: I did not use the self forcing lora for the highnoise part of the workflow, because I find that it limits facial variety too much. In my testing, this happened with or without the missionary lora plugged in, so if you're having trouble with facial variety, you may have to disable the speed lora for the highnoise step.
I2V notes: This version should handle cases where the penis starts outside the vagina. It can handle cases where there's no man at all in the starting image, however, don't expect the penis to look very good in these cases. Wan still doesn't have a good idea of what they're supposed to look like. As with the T2V version, I found that using the speed loras for the highnoise step causes the results to be less good. For 5 seconds of video, the lightning lora seemed to cause the videos to be in slow motion when using it on the highnoise step. Maybe there are good settings that I just haven't found yet.
Wan2.1:
I2V version: While the T2V model can work really well as I2V, it seems to be picky about resolutions and aspect ratios and things like that. The specifically trained I2V model just seems to work better in more situations. So, it's becoming clear to me that providing an I2V model along with the T2V model is worth it. It uses the same training data as the T2V one, minus the static images.
Version 1.1 Updates: Trained at a higher resolution in an attempt to get higher quality outputs. I also added more training data so there's more movement and bouncing breasts.
Important parts of the prompt:
with her legs spread having sex with a man
...
A man is thrusting his penis back and forth inside her vagina at the bottom of the screen{Movement is fast with bouncing breasts|Movement is slow}Her breasts are {small|medium sized|large}Sometimes it may help to put the things you don't want in the negatives. For example, if you want small breasts, putting large breasts in the negative could help.
Description
FAQ
Comments (64)
Not all heros wear capes, great work!!!
Can you link any article or guide that matches your process and tools for a beginner like me?
The best guide I have is the config files I included in the training data download. I use this https://github.com/tdrussell/diffusion-pipe. Follow the directions there to set it up and use my config files as a starting point.
Thank you so much for this -- been looking ALL over for a 14B training process that works using Diffusion-Pipe. Quick question, how long did it take to process the 25 epocs on your 4090? Like 8 hours or so?
Honestly, I didn't think to look, as it ran while I was sleeping, but it was less than 8 hours. I'd guess more like 4 or 5. Also, I'm using a 3090.
@dtwr434 Looks really good! Thanks for the lora.
Could you share how you set up the model? Did you use the official repository with the T2V-14B model or something else?
I can't run a training session on 3090, I get an Out of memory error, despite trying to run the training session in 256x256 resolution with 65 frames.
@Aivanjo Yes, I used the official repository for 14B. I think you're running out of memory because that resolution and frame count is too high. I used 244 resolution and 32 frames, and it just barely fit into 24 GB. 32 frames in Wan is 2 seconds, so that's enough for movements like this. You can include some high resolution images to augment the dataset if you need to teach it what something looks like, as I did with this one.
@dtwr434 Ok thanks sir!
@dtwr434 Yes, it worked. Thank you!
great googly moogly
looks good. Used it at around 0.9 strength, and it might be too high or too low. My backgrounds have been a bit less detailed than without the lora, but still keeping consistency and without significant artifacts.
Did you use diffusion-pipe or the new musubi-tuner support?
I couldn't get diffusion-pipe to train on videos with my 24GB 4090, even down to 640x386 input size and 33 frames. Kept OOM'ing, even with fp8 transformer dtype.
Or did you just train on images? If you did train on videos, what did you dataset config toml look like? And what size were your videos?
I used diffusion-pipe. I provided all the settings I used in the training images download, so you can take a look at that. I used 244 resolution and 32 frame videos, as well as a handful of 800 pixel images. 244/32 barely fits into 24GB, so you probably can't go beyond that.
You can use low resolution stuff to train it motions really well, and then just include some images to teach it what the thing is supposed to look like in detail.
Ah interesting, that's a good idea. Low resolution video + high res images. With musubi-tuner, I'm able to train 688x384 x 45 frame videos on the 14B T2V model, using 32 for the block-swap parameter, which diffusion-pipe doesn't have. A little slower, but something you could try as well.
King! Keep it up bro! Wan ftw!
I2V 14B 720P it work.
great job!
does anyone have this working with a gguf quant?
Yeah, I used the Q4 one seems to work. I read torch compile messes with LoRAs so maybe that could be it if it isn't working for you.
@hiben40387 It was torch.compile! Thank you!
Did you still have to download that 100gb+ model? I used that one but rented an h200 on vast.ai for making my LORA. For a 4090, were you able to use one of the FP8 models?
You can use the float8_e4m3fn model directly with musubi-tuner, but diffusion-pipe requires the full >100GB model, and casts it down to whatever quant you want.
I'm using this directory, which adds up to about 64GB of stuff https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/tree/main. But yeah, you tell it to load it as float8 to get it to work in 24 GB. I was right at the limit using 244 resolution and 32 frames, so if you need more than that, you still might be better off renting something.
@Bbbrrr wow no shit? And it produces decent results? I might have to try that then. Musubi work on windows or I need to throw that on WSL? I have done some LORA training on my 4090 with diff pipe but so far for WAN, all my LORA have been done on vast.ai renting a linux server with h200
@CapAndABull Results seem good so far. Not sure if it works on Windows, but it should. Uses sdpa or flash_attn. Using linux though.
Bless you
Very nice LoRA
Had some trouble to get it to work properly with 480P I2V but got there eventually.
Thanks!
@dtwr434 requesting your side missionary one next! with some tweaks to the set or maybe just base on WAN itself the movement and flexibility could be awesome.
When doing v2v (small gen then latent upscale), is anyone else getting artifacts just on the bottom 15-25% of the video?
I'm going from 240 x 416 > 720 x 1248 and there is a clear line across the video where blocks and flashing artifacts can be seen below
looks like some kind of face mask trained in, get on about 10% of gens so far
I'd think if you describe something about her face, like a facial expression, it should go away.
You can also probably put censored, blurred in the negatives, as that's how I captioned it.
@dtwr434 doesn't work, it changes it from a blur to a pencil drawn face in an oval!!
@azeli That's really weird, I never encountered anything like this when making the preview images. Wonder if it's something to do with your workflow or prompt.
@dtwr434 seems to happen on higher resolutions, especially if I try and go straight for a 720p gen
Hmm, OK. I did include some high resolution images and a couple had faces blurred in those, so I wonder if it's relying too heavily on those when you go high resolution. Though, I'd think the alternative would be you always getting the same face. Not sure what to do about that.
Thanks for this lora. Big fan!
I noticed something I like about your other loras is missing on this one, which is upper body jiggle motion. I see that a lot in side missionary and the new cowgirl you released. Not sure why that is. Maybe adding more forceful thrusts in the dataset would help?
I was thinking about this, and I suspect the reason for this is because when it gets more forceful, it also means the camera starts shaking like crazy. It works for the other ones cause the camera is off to the side. So yeah, it's just really hard to get training data for it.
@dtwr434 One way I can think of is VR footage. We'll have to flatten it, but I think most of them are POV but with a stable tripod on the camera rig. I'm going to try this route too.
Noticed on your training data that you have videos blurred. I'm assuming you're blurring faces. Diffusion pipe supports masked training, which means it avoids training the masked regions. That will keep the model from learning to blur it too.
Maybe you already knew, just wanted to share.
OK yeah, the blur worked really well with Hunyuan, where if you caption it correctly, it doesn't affect the output, but it seems to be a problem with Wan. I had heard about masking, but it sounds much harder to do, unless there's some tool to generate the masks for you. I might have to figure it out since blurring isn't working as well with Wan.
@dtwr434 Yes, I'm not sure what's the best way to do the masks? But it seems a rudimentary black and white image with the face region painted black (or grey for train only a bit), then converting image to a video will work.
I did notice on your Hun cowgirl lora (my fav btw) the faces tend to get a bit "artifacty" if you noticed. Not sure if this is the reason?
Amazing quality! Well done! I think something is missing. There is no way to choose the I2V of Wan model. It's only Text 2 video supported. Is it possible to create a node or a workflow that connects a Lora to an Image 2 Video Wan Model?
That would be much appreciated!
If you check the vid I uploaded the i2v workflow is embedded.
@logenninefingers888 Thank you very much for your reply. I just saw the video you uploaded. I saw that in ComfyUI you are using the Hunyuan Lora loader. I think there is an official Wan Video Lora Loader but it's not the same. Anyway I will run your workflow and let you know. Thank you!
@nontas good spot thanks, I'll check if it makes a difference.
@nontas Who to use his workflow from the vid
What strength for this?
In general, I'd say set it to the lowest strength you can while still getting the desired motion. Sorry I don't have a specific answer, but to be honest, I've spent most of my time training loras so far rather than using them.
Hey, thanks for your efforts, but I've noticed that your loras make some pretty muddy/blurry outputs, and I suspected that you're training with low-res/SD video downloads. I just checked your notes in the training data, and lo and behold that's exactly what you're doing. You also then crop the videos/deal with watermarks, saving it again (presumably recompressing).
PLEASE DON'T DO THIS!
When you download 480p videos you get a re-encode at a low bit-rate. You're training bad compression artifacts into the lora.
Please download only HD/4k (as high as possible) videos, crop/edit as needed, and then downscale them to 480p or whatever, saving the videos as high bitrate/lossless if possible.
This has been tested and confirmed by others who found doing this to give far superior results.
Honestly, I haven't noticed muddy/blurry outputs. Do you have examples to compare to? The cowgirl lora is the one that sticks out as really bad to me, and I'm trying to fix that now. When generating at 720p, things look pretty sharp to me. Maybe if I could see a comparison, I'd understand the issue better.
@dtwr434 The cowgirl one is definitely the worst, but I see it in all of your loras to some extent, especially where there's motion (which is exactly where the compression artifacts get worse in the source material). Sometimes it's subtle, but it's definitely there. It often just looks like wobbling/warping of small patches rather than blurring as such, and sometimes it just overall has a low-quality look to it.
It's hard to give any comparison since not using the lora isn't going to give the same subject. But it's essentially what you can see in the cowgirl one, but more subtle in the others.
Genning at 720p probably mitigates it somewhat, but I can't do that.
I should be clear that a bunch of other people's loras have the same issue, so I'm not trying to single you out, it's just you're so prolific, which is awesome, but I think the visual quality could be much better if you used higher bitrate sources.
Here's a quote from someone on the NSFW API Discord (Hunyuan, but same should apply to Wan):
"the samples are 4k to start with and then resized to 512 and 244 res. But in order to maintain clear outputs, you need high res, low-lossiness sources. Most people are training on porn tube slop with compression artifacts. If you start with crisp 4k and use lossless downscaling, you can preserve a lot of the information that Hunyuan needs to produce crisp video"
Well at this point, I'm not going to re-collect all of the video samples I've already done. I could maybe consider it if I make any brand new things, but I'm still not totally convinced I see the issue. With the masturbation lora, for example, when the woman's hand starts moving quickly, I can see it's a little less clear, but I don't know if that's just how the model works or if it's the fault of the lora.
If someone from that discord could put out a Lora that is better, and I can see the difference, that would maybe help motivate me to revisit some of this, but at the moment I'm happy enough with these loras.
Also, if someone out there is willing to rent crazy hardware to train at higher resolution in general, that's kind of what I'm hoping for some day. Especially if they can finetune the model itself to add some of these concepts.
@kangaru861 I tried a test using 1 source video. I downloaded the 2K version (highest available), and then when clipping/cropping, I used a lossless format. I ended up with a 30MB 3 second video clip. Then I did the same thing starting with the 480p version, and clipping/cropping as I usually do. Then I trained a lora on both of them.
To be honest, I can't really tell a difference in their outputs. Maybe I'm just not good at noticing subtle details, but it's not the night and day difference I was hoping for. I think at the end of the day, when it gets reduced to 244 resolution, it just looks the way it does.
If we want high quality outputs, someone is probably just going to have to rent some serious hardware and train at a higher resolution.
@dtwr434 Thanks for testing it out! I'm not sure you're going to see much of an impact from just one video though? Would a lora trained on a single clip even really do anything substantial? I feel like it would have minimal impact on the base model.
Anyway, the difference is largely going to be in places with a lot of motion and detail (that gets lost/garbled), so if what you're training on doesn't have that, then you wouldn't see much impact. That is, most parts of most clips may be perfectly fine the way you originally did it, but there could be a couple of parts that are ruined by the compression that end up having an impact (e.g. fast hand or hip movement).
Could you go through your clips you've previously used (the final ones fed into training) and see if you can find any parts with fast movement that turn to mush due to compression (maybe pause the video to get a closer look), and maybe do some tests on a higher res source for those clips in particular?
@kangaru861 Well, I mean I trained on that one video until the point where it could basically recreate something very similar to what was going on in the video. There was plenty of movement, and you could see the mouths and stuff getting really blurry and weird when they moved. But, both the high quality one and the low quality one looked equally bad during the movements.
Thank you very much for providing guidance in training a wan lora!
Which epoch did you chose in the end?
I can see that you trained up to 25 but did you upload this one, or was a previous one better?
How did you test the resulting loras? Same WF, same prompt, same seed and the just a few seconds and steps to quickly verify?
Honestly, I just tried the 25 one and it seemed to be working well, so I didn't try any others. I'm finding that 25 epochs with 10 repeats pretty much always ends up with something good. Sometimes you only need 20.
What you're suggesting would be a good strategy though if you were trying to find the very best version. I would probably try various prompts as well though to test its ability to follow directions.
came to say the same thing - really appreciate the shared training information
0.7 strength seems to work better for preventing bad output with artifacts.
same with v1.1, having it on like .6 or .7 also reduces the face burn-in
tell me something, because I cannot get that penis to move like your samples, are you using an external generator, such as one civitai my provide? I follow all suggestions and all I get is distortion, frozen cock, mangled genitalia on both genders, what am I doing wrong, is this all I can expect outside of paid services that have "appropriated" certain loras into their own sites and removed them from civitai and you know I am correct there, but hey, its probably me, what am I doing wrong
Probably just a workflow issue, but I don't know what you're doing so I can't say.
@dtwr434 yep, you were correct, started again created my own workflow and bingo, nice!
Any particular prompt needed for this?
The example prompts are provided for the showcase images if you click on them.
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.