CivArchive
    Furry nsfw wan 2.1 1.3b img2vid (Wan2.1Fun InP) - v2.0 e70
    NSFW
    Preview 68772320
    Preview 68772316
    Preview 68772317
    Preview 68772324
    Preview 68772323
    Preview 68772319
    Preview 68772322
    Preview 68772318

    This lora requires an unofficial wan model for i2v on 1.3b parameters

    Note: This lora is made for wan 2.1 fun 1.3b inp 1.0, not 1.1. Using it with 1.1 probably won't work as well as with 1.0.

    This lora is intended for use with https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-InP, other 1.3b wan img2vid models might be supported, but only if they use the same weight names, otherwise it will only partially work. Download the diffusion_pytorch_model.safetensors and place it in your comfyui checkpoints folder. The other model files are the same as the 14b's i2v files, 14b i2v workflows should work if you switch the model.

    I've also reuploaded it to civitai now, https://civarchive.com/models/1450534?modelVersionId=1640053

    The 1.3b model isn't bad for nsfw content, people are likely training it wrong, this lora was trained on a large variety of content and can output a large variety of content. Both furry and realistic content are supported.

    Human characters

    While this lora was made for anthro furry characters, for the last few epochs of v2, a significant amount of human content was included to make the motions look more realistic, including physics, human content was tagged with "realistic" at the end. Furry content was tagged with "furry animation" at the start.

    Prompting guide

    Theoretically most human language prompts should work as well as tag prompts, as I varied them throughout the dataset, all videos were human-cut and captioned by me, using a tool to make it more convenient. I might consider uploading the tool once it's more convenient to use. Not providing a prompt usually leads to very little movement.

    "the woman" and "her" are interchangable, same with "the man" and "he"

    Trained prompt structure (do not copy directly, stuff in brackets are just examples): furry animation, {character description} is {action description}, {additional descriptions}, [realistic|the scene is depicted with a detailed 2d drawing|the scene is depicted in 3d]

    Character description examples (doesn't need to be that specific usually since it's i2v): an anthro furry fox woman, in case of human characters you can usually just put a woman.

    Action description describes the position, currently a few working options are: cowgirl position, reverse cowgirl position, doggystyle position, missionary position, teasing with her tongue, [a woman] uses her breasts to stroke a man's penis. There are probably a few more.

    Additional descriptions consist of perspective, (pov is going to work the best), speed, depth, pulling out (doesn't work well currently), cumshots (also doesn't work very well).

    perspective is written as natural language, pov was mostly tagged as viewed from a first-person pov perspective, since it's i2v you don't need to worry much about this, but also just tagging pov should also work.

    speed is described in natural language, the words used do make a difference. {speed} [thrusting|riding|sucking] will make a difference.

    depth is described similar to speed, except with depth

    movement of the woman can be prompted with: she moves up and down as she rides his cock.

    movement of the man can be promped with: he thrusts into her pussy. And similar, speed can be included here as well, I've noticed it still works.

    additionally, you can add stuff like the woman's ass jiggles with each thrust. I can't really put a full list here.

    Version readmes

    v3 readme

    v3's release is not quite a new lora, it's actually new, rank 128 lora merged at 30% onto v2, then extracted as a rank 128 lora again. The v3 lora I trained was not very impressive on it's own. It might be better at 2d content. After merging, I'm noticing it is more consistent, and often higher quality, with more motion than just v2 e70. For preview images, barely any cherrypicking was involved.

    v2 readme

    The model has been re-trained from scratch, with a few notable changes. The img2vid results should look more fitting in nearly every result, and there should be much more motion.

    Changes from v1's training:

    • Base model: While v1 was trained on the default Wan t2v 1.3b, the new model is trained on the actual Wan Fun 1.3 Inp. Which is the model this is intended to be used with.

      • This was achieved by simply providing the missing information in diffusion pipe, it's technically already supported, it just needs to be activated. This PR enables that.

      • This not only helps the model properly use movements, it also improves consistency with img2vid

    • The lora's rank has been increased from 32 to 64

    • The dataset has had a few changes

      • The videos have been 16fps from the beginning

      • The training resolution has been dropped from 400 to 256 (as a tradeoff for memory usage) (upped to 480 for e70, as this seemingly improves motion)

      • The training frame count buckets have been improved, from v1's [1, 24] to v2's [1, 16, 24, 32, 40]. This allowed for training on longer videos with more context info.

    • The v2 model was trained at a higher learning rate than v2, I might consider a value in between the old and current

    At only 12 epochs, the model has more consistent motion than v1 at 40 epochs!

    The training dataset contains human data since the switch to 480 res. This helps with movements and physics, it also reduces artifacts like random cutoff. There are still some "stretch" artifacts in some situations.

    v1 readme

    A model that should be better at animating furry porn, that's pretty much it. It's not good at txt2vid, so I don't recommend that, maybe this could be improved by training on images as well.

    This is mostly a proof-of-concept to demonstrate that a lora can be made for Wan 2.1 Fun 1.3b Inp, and I think it shows that this is indeed the case.

    Btw, generating short videos (<1.5 sec) with img2vid at a slightly lower resolution lets you generate a video in about a minute on an rtx 3060. Doing the same with the 14b model takes me more than 10 minutes. The 1.3b deserves more love.

    Usage

    Most importantly, use Wan 2.1 Fun 1.3b Inp, with img2vid, as using regular txt2vid is not going to give very good results due to the lora not being high rank enough, or even trained enough. While some concepts will be visible, it will not produce very good quality outputs.

    When testing, I noticed that just prompting naturally usually yields the best results, however, there are a few things that have been tagged a few times in the dataset.

    Note that neither speeds or depths are going to have much impact, likely due to some issues described in the training section.

    • Positions

      • The model was trained on cowgirl, reverse cowgirl, missionary, blowjob, deepthroat, some teasing as well

    • Perspective

      • Mainly "viewed from a first-person pov perspective", "viewed from the side". Other descriptions should hopefully work.

    • Speeds

      • Speeds are written like "[speed] thrusting" or "[speed] sucking"

      • Available speeds are: "slow", "moderate speed", "fast" and "very fast"

    • Depths

      • Depths are written like "[depth] thrusts" or "[depth] sucks"

      • Available depths are: "shallow", "moderate", "deep" and "balls deep"

    • Features

      • Jiggling breasts (Seems to be pretty noticable in generations)

      • Jiggling ass

    This lora has been tested with images generated with Novafurry and Willy's Noob Realism. As shown in the preview videos. It should work on outputs generated from whatever model though.

    Training info

    This model is a LoRA painstakingly trained on a single rtx 3060 for a total of 40 epochs on a dataset of about 45 manually tagged clips of nsfw furry content.

    The first ~36 epochs were trained with varying framerates, assuming diffusion-pipe doesn't fix that, I then re-encoded the dataset to use 16fps, and trained 4 more epochs, this seems to have made the motion a little better, overall, I'm still not happy.

    The dataset was scaled to resolutions with pixel counts similar to 400 pixels, at 24 frames. This still used too much vram, so I used a block-swap of 10, I was able to train at about 2 epochs per hour.

    I used diffusion-pipe for the training, since I don't have a budget for anything I trained locally.

    The model seems underfitted for txt2vid, the lora rank is also only 32, if I were to train it again, there are a few things I would do differently, namely:

    • I would start by training on images, so the model can get a better understanding of what anthros look like

    • I would retag the dataset, going over each entry multiple times instead of just once, since I feel like I might have missed some things

    • I would use a higher rank for the lora, as I believe 32 might be a bit low for such a broad concept

    • I would make sure the dataset is already in the correct framerate, as I noticed there was not much movement except with some less commonly used tags, which might be caused by it being effectively in slow motion in the case of high fps videos

    While this was trained on Wan 2.1 txt2vid 1.3b, it is intended for img2vid using https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-InP, as I have noticed no additional training is needed, and Wan 2.1 txt2vid 1.3b loras will work properly on Wan2.1 Fun 1.3B InP. I hope this information helps others in the future.

    I am overall not happy with how this turned out, but will likely retrain this model from scratch in the future, when I can put some money into a cloud gpu provider or similar to train faster without preventing me from doing other things.

    Yap yap yap, go try the model or something

    Description

    Trained on some real content, resolution increased to 480. The result is more controllable motions and more variety.

    Lower the cfg if you get artifacts like weird added shapes on the face or other. 5-7 seems good, 7 can sometimes produce slight artifacts though.

    FAQ

    Comments (22)

    Mitch_Connor_420_69Apr 8, 2025· 3 reactions
    CivitAI

    I'm not into furry at all but I'm still impressed by the results that you've shown in your preview images. I'm sure that people who are into furry are very happy

    mylo1337
    Author
    Apr 8, 2025

    In theory it should work well on humans as well

    AI_2_addictedApr 8, 2025· 2 reactions
    CivitAI
    FANTASTIC WORK!!! Can you do it in the standard version of WAN 1.3B T2V? This time focusing on real people (pussy/penis/general nsfw)?? A version in this great quality that you do would be amazing for producing NSFW content for most people with modest computers since the standard version of WAN 1.3B is very fast and very good.
    mylo1337
    Author
    Apr 8, 2025· 2 reactions

    For humans alone, yeah, that could be possible, a furry t2v lora would take ages to properly train, but humans is very doable.

    On top of that, you can try this lora on humans as well, the last version has a significant portion of human videos, which I added for more natural motion and better understanding, the previous versions already worked on humans but now expecially, you could try it.

    AI_2_addictedApr 8, 2025· 1 reaction

    @mylo1337 Great to know! The standard Wan 1.3B T2V really deserves a loras with the quality you make dedicated to (men and women, penis, pussy, general porn)

    VolkinApr 8, 2025· 6 reactions
    CivitAI

    Clearly one of the best nsfw loras for a lightweight model with the capability to do many things. Fast, efficient and with great quality!

    6927513Apr 8, 2025· 2 reactions

    Even training this one is efficient, to put it into perspective training the 14b i2v takes a sustained 12gb VRAM and 90GB system ram and during the caching stage 127 GB of ram. Thats on single_middle and up to 32 frame brackets. Not to mention the 14B is 89 seconds/IT average vs 3.5 seconds/it for the 1.3B.

    VolkinApr 8, 2025

    @basedbase  Yeah. I didn't know the training for the 14B was that much system ram. I only did some experimental training with images only on an RTX 6000 ADA, but probably will try video training as well on the same card equipped with +128GB RAM. But yes, the 1.3B is worth all the training it can get for I2V and T2V use cases.

    6927513Apr 8, 2025

    @Volkin If I where to try to train the i2v 14b with multiple_overlapping and my current dataset it would take 2 weeks. Thats with 128gb DDR5 and a 4070 S. Pretty intensive.

    VolkinApr 8, 2025

    @basedbase Haha yes.I had some ideas but it's not going to fly on my 5080 + 64GB DDR5 even if i go grab a 128GB kit judging from what you've said and the same goes for using a 6000 ADA on the cloud. The card is cheap to rent but not very powerful in processing power. It's below 4090. H100 PCI on the other hand is roughly 20% faster than a 4090 so i guess put several of them in parallel or something.

    Don't know but will experiment and see how it does.

    6927513Apr 8, 2025

    @Volkin My lora has improved a ton using his settings. Going to continue training with a higher res and report back.

    VolkinApr 8, 2025

    @basedbase Thank you :) I am learning a lot from him as well :) He's done fantastic job :)

    6927513Apr 8, 2025· 1 reaction

    @Volkin He really does, he has more knowledge than this entire discord server im in thats about lora training.

    VolkinApr 8, 2025

    @basedbase 100% agreed!

    6927513Apr 8, 2025· 1 reaction

    @Volkin Just posted a generation from epoch 70 of my lora on my profile, swapped to a batch size of 4 significant improvement across the board at just an extra 10 epochs. Lora is still training so hope it continues to get better

    6927513Apr 9, 2025· 1 reaction

    @Volkin mylo is goated epoch 100 seems to produce better animations than my 14b lora. rerunning training from scratch with the 60 new vids, swap back to batch size of 1 from 4 and swap the lr to 6e-5 since the 1e-05 rate may be too low.

    VolkinApr 9, 2025

    @basedbase Good luck :)

    QualityControlApr 8, 2025· 2 reactions
    CivitAI

    first furries in the military - now furries driving video gen innovation. Whats next? furries on the moon?

    6927513Apr 9, 2025

    potentially

    QualityControlApr 9, 2025· 1 reaction

    @basedbase to infinity and beyond furball

    6927513Apr 9, 2025

    @QualityControl Based reply

    FullMetal1111Apr 9, 2025· 6 reactions
    CivitAI

    When quality meets quantity

    LORA
    Wan Video 1.3B t2v

    Details

    Downloads
    555
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    4/8/2025
    Updated
    4/22/2026
    Deleted
    4/16/2026

    Files

    furry_general_nsfw_v2_e70.safetensors