CivArchive
    Furry + nsfw wan 2.2 5b - v1.0 e75
    NSFW

    V2

    V2 is more consistent, has more stable movements, and should get less artifacts. Seems to work very well for 2d inputs as well. All previews were prompted with one prompt for both t2i and i2v, writing separate prompts and picking a good starting image should give even better results.

    Use "turbo" lora for high-quality generations in just 4 steps!

    The turbo lora is available on huggingface: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22-Turbo/Wan22_TI2V_5B_Turbo_lora_rank_64_fp16.safetensors

    To use it, set steps to 4 and cfg to 1. I'm not sure what the recommended sampler/scheduler is but I've had great results on multiple samplers and schedulers. I personally use euler/euler a + beta scheduler.
    Using a slightly lower resolution (but not low enough to reduce quality much) I can generate 80 frames in just 2 minutes on a 3060.

    This lora is recommended for i2v, but t2v might work decently as well.

    Trained on my new mixed furry/human dataset with detailed captions. Older versions of which were also used for the experimental and semi stable text to video loras.

    Prompting

    Prompting should use natural language, you need to generate at 720p, so for example 1280x704, 704x1280 or 960x960 will be valid. This might be more important for i2v than for t2v, I've noticed artifacts with i2v.

    In a prompt, you can describe "a 3d animation", "a 2d animation" or "a real video", this is most useful for t2v but could help i2v as well.

    You can view the prompt on the example videos for info as well.

    Description

    First version

    FAQ

    Comments (28)

    6927513Jul 31, 2025· 2 reactions
    CivitAI

    May I ask what training settings you used? The 5b has trained extremely fast for me but I am getting pretty glitchy epochs using a known good dataset. Also what sampler settings? I noticed the settings that worked for wan 2.1 do not work at all with wan 5b.

    mylo1337
    Author
    Jul 31, 2025· 1 reaction

    Check your resolutions, wan 2.2 5b does not support 480p. I trained with 700 res, ~900 might be even better too. 5b uses the new vae which compresses 4x as much, so training is much easier for the same resolution, but you need to use 720p-like resolutions

    6927513Jul 31, 2025

    mylo1337 Thanks, also what are your training times, I trained at 512x512 and it takes maybe 2 hours tops on dual 3090's

    mylo1337
    Author
    Jul 31, 2025

    basedbase An epoch at batch size 1 took around 18 minutes, at batch size 2 they took about the same amount of time, maybe less. I trained it for a total of a little under 24 hours on a single 4090 on runpod

    6927513Jul 31, 2025

    mylo1337 As a side note if I could physically fit more 3090's into my workstation I would but 2 is the max even in a super tower case lol, would be great to have 4x 3090's since that would allow really fast training to test tons of settings. I'll be working on a wan 2.2 27b to wan 2.2 5b distill lora and see of the quality increase is as dramatic as the wan 14b to 1.3b lora I made.

    iluvlamiaAug 30, 2025

    @basedbase is it some kind of black magic?

    6927513Jul 31, 2025· 3 reactions
    CivitAI

    I cant seem to get any good results on i2v everything keeps turning out anatomically morphed.

    mylo1337
    Author
    Jul 31, 2025· 2 reactions

    Are you sampling at the correct res? I'm using swarmui with the lora and 960x960 based resolutions (so 720p). Using a lower resolution causes the issue that you described so that might be the issue?

    and make sure your comfy is up to date

    6927513Jul 31, 2025

    mylo1337 Yea I tried 960x960 and 1208x720 and everything either has no motion or is anatomically spazzing out. Samplers ive tried lcm+simple for 30-60 steps no dice, dpmpp+sgm_uniform for les than 30 steps and more than 30 steps sam thing. Even tried all known combos that worked great with wan 2.1 and they are some how even worse. No idea what is wrong. Any chance you have a workflow so I can compare to figure this out?

    mylo1337
    Author
    Jul 31, 2025

    basedbase I've tried euler/euler a+beta and unipc+simple and they both worked for me.

    I didn't use a workflow, but I used swarmui (so it used an auto-generated comfy workflow), you can link swarmui to an existing comfy install and use it in swarm if you want to compare with it.

    6927513Jul 31, 2025

    mylo1337 I'm curious, do you have a very large dataset? My dataset is 81 videos long and it still trains incredibly fast.

    mylo1337
    Author
    Jul 31, 2025

    basedbase about 250 videos iirc

    6927513Aug 1, 2025· 2 reactions
    CivitAI

    To get a good Wan 5b generation, it took me coding a script to merge the original sharded model into a single fp32 safetensors model and inferencing with that so either my setup is cursed or wan 5b looses way too much quality when quantized. Even the fp16 was terrible. Even in full fp32 precision VRAM usage is only 21.6gb.

    mylo1337
    Author
    Aug 1, 2025

    I run 8bit usually, not a huge difference from 16bit in quality. q4 does lose a bit of quality though. Again, when I had issues it was the resolution, not the model weights

    mylo1337
    Author
    Aug 1, 2025

    If you use swarmui with default settings it'll work. I use euler a + beta. And sigma shift 8, pretty much default otherwise. It's all in the prompting and resolution. If you use a low resolution the outputs glitch out and get artifacts.

    Also, have you even gotten wan 5b working without the lora first of all? I think your issue is either your comfy being outdated, using the wrong models or something similar. For reference, I'm using the official comfy merge https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors.

    I've also uploaded more previews, with better cowgirl example gens now, wan 2.2 models are trained to allow prompting for motion, but this motion will usually be very aggressive. Use words like "she moves her hips in rocking motions" instead of "she moves her hips up and down" as that usually gets interpreted as her flying up and back down

    6927513Aug 1, 2025

    mylo1337 mylo1337 I've both updated comfy and used the official fp16 and 768x1024 resolution and most outputs are glitchy or dont have much movement. It does the same on my loras I have trained so I am perplexed but alot of people on reddit are also having similar issues.

    Ada321Aug 2, 2025

    basedbase 768x1024 is why, 5B does not work outside of 1280*704 or 704*1280

    6927513Aug 2, 2025

    Ada321  in my testing 768x1024 seems to perform better or slightly worse than 704x1024

    mylo1337
    Author
    Aug 2, 2025

    basedbase have you tried the default workflow with the lora loader added? Multiple people including myself got it working without issues with default settings. I did have issues like yours at one point (i2v getting artifacts) but it was caused by the lower resolution I used at the time.

    Make sure you're not using any custom nodes that could effect the internal resolution. And make sure the model itself works on your PC before you blame my lora.

    iPiKoAug 9, 2025

    basedbase Do you have the script for the fp32?

    VolkinAug 1, 2025· 3 reactions
    CivitAI

    Overall fine experience with this model with the basic native Wan 2.2 5B workflow provided by comfy. I used natural language in the prompting and 8/10 times i got very good results!

    RedditUser981Aug 2, 2025· 5 reactions
    CivitAI

    can anybody share workflow for 5b wan 2.2 nsfw i have only 6gb vram help me

    iPiKoAug 2, 2025· 3 reactions
    CivitAI

    Hey any technicals details or tutorial on how you trained this?

    mylo1337
    Author
    Aug 2, 2025· 4 reactions

    I use diffusion pipe for most of my training, I used my config I used for training wan 2.1 14b for an earlier model, changed it so it uses one gpu and has a batch size of 2, and ran it on a runpod pod.

    For captioning I captioned everything manually in captiontool https://civitai.com/articles/16284/captioning-with-captiontool. I made sure the captions were very descriptive, clipping either only the part related to the prompt, avoiding any cuts in the original video and limiting to a little over 10 seconds per captioned clip.

    In diffusion pipe, I used the "multiple_overlapping" video clip mode, which ends up training multiple times per epoch on different sections from the video. So a 10 second clip with 80 frame snippets at 24 fps will end up working as about 3 clips, this trains some variety to allow the model to start as if the input frame was already in motion.

    iluvlamiaAug 30, 2025

    @mylo1337 how many hour trained? which gpu and vram used?

    mylo1337
    Author
    Aug 30, 2025

    @ifuta v1 was <24 hours on a 4090. V2 is about 48 hours almost on a 4090. Similar cost for both though since when I trained v1 on runpod. There were no working pods on community cloud so it basically had double the hourly cost that v2 had

    serpentiniteAug 12, 2025· 4 reactions
    CivitAI

    Hey master, any plan for the 14B lora? <3

    mylo1337
    Author
    Aug 12, 2025· 1 reaction

    At some point yeah. But with my PC it'll take ages for test gens due to the high memory usage (and the fact it's 14b x2 on my 3060). I also want to make sure I know what the optimal way of training is, or if people start running a single model instead of both, I should train for that one.

    LORA
    Wan Video 2.2 TI2V-5B

    Details

    Downloads
    994
    Platform
    CivitAI
    Platform Status
    Available
    Created
    7/31/2025
    Updated
    4/28/2026
    Deleted
    -