Furry nsfw wan 2.1 1.3b img2vid (Wan2.1Fun InP) - v3.0 merge 30% v3 70% v2

NSFW

Preview 70823014

Preview 70822992

Preview 70822955

Preview 70822930

Preview 70823029

Preview 70823021

Preview 70822978

Preview 70823026

Preview 70823027

This lora requires an unofficial wan model for i2v on 1.3b parameters

Note: This lora is made for wan 2.1 fun 1.3b inp 1.0, not 1.1. Using it with 1.1 probably won't work as well as with 1.0.

This lora is intended for use with https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-InP, other 1.3b wan img2vid models might be supported, but only if they use the same weight names, otherwise it will only partially work. Download the diffusion_pytorch_model.safetensors and place it in your comfyui checkpoints folder. The other model files are the same as the 14b's i2v files, 14b i2v workflows should work if you switch the model.

I've also reuploaded it to civitai now, https://civarchive.com/models/1450534?modelVersionId=1640053

The 1.3b model isn't bad for nsfw content, people are likely training it wrong, this lora was trained on a large variety of content and can output a large variety of content. Both furry and realistic content are supported.

Human characters

While this lora was made for anthro furry characters, for the last few epochs of v2, a significant amount of human content was included to make the motions look more realistic, including physics, human content was tagged with "realistic" at the end. Furry content was tagged with "furry animation" at the start.

Prompting guide

Theoretically most human language prompts should work as well as tag prompts, as I varied them throughout the dataset, all videos were human-cut and captioned by me, using a tool to make it more convenient. I might consider uploading the tool once it's more convenient to use. Not providing a prompt usually leads to very little movement.

"the woman" and "her" are interchangable, same with "the man" and "he"

Trained prompt structure (do not copy directly, stuff in brackets are just examples): furry animation, {character description} is {action description}, {additional descriptions}, [realistic|the scene is depicted with a detailed 2d drawing|the scene is depicted in 3d]

Character description examples (doesn't need to be that specific usually since it's i2v): an anthro furry fox woman, in case of human characters you can usually just put a woman.

Action description describes the position, currently a few working options are: cowgirl position, reverse cowgirl position, doggystyle position, missionary position, teasing with her tongue, [a woman] uses her breasts to stroke a man's penis. There are probably a few more.

Additional descriptions consist of perspective, (pov is going to work the best), speed, depth, pulling out (doesn't work well currently), cumshots (also doesn't work very well).

perspective is written as natural language, pov was mostly tagged as viewed from a first-person pov perspective, since it's i2v you don't need to worry much about this, but also just tagging pov should also work.

speed is described in natural language, the words used do make a difference. {speed} [thrusting|riding|sucking] will make a difference.

depth is described similar to speed, except with depth

movement of the woman can be prompted with: she moves up and down as she rides his cock.

movement of the man can be promped with: he thrusts into her pussy. And similar, speed can be included here as well, I've noticed it still works.

additionally, you can add stuff like the woman's ass jiggles with each thrust. I can't really put a full list here.

Version readmes

v3 readme

v3's release is not quite a new lora, it's actually new, rank 128 lora merged at 30% onto v2, then extracted as a rank 128 lora again. The v3 lora I trained was not very impressive on it's own. It might be better at 2d content. After merging, I'm noticing it is more consistent, and often higher quality, with more motion than just v2 e70. For preview images, barely any cherrypicking was involved.

v2 readme

The model has been re-trained from scratch, with a few notable changes. The img2vid results should look more fitting in nearly every result, and there should be much more motion.

Changes from v1's training:

Base model: While v1 was trained on the default Wan t2v 1.3b, the new model is trained on the actual Wan Fun 1.3 Inp. Which is the model this is intended to be used with.
- This was achieved by simply providing the missing information in diffusion pipe, it's technically already supported, it just needs to be activated. This PR enables that.
- This not only helps the model properly use movements, it also improves consistency with img2vid
The lora's rank has been increased from 32 to 64
The dataset has had a few changes
- The videos have been 16fps from the beginning
- The training resolution has been dropped from 400 to 256 (as a tradeoff for memory usage) (upped to 480 for e70, as this seemingly improves motion)
- The training frame count buckets have been improved, from v1's [1, 24] to v2's [1, 16, 24, 32, 40]. This allowed for training on longer videos with more context info.
The v2 model was trained at a higher learning rate than v2, I might consider a value in between the old and current

At only 12 epochs, the model has more consistent motion than v1 at 40 epochs!

The training dataset contains human data since the switch to 480 res. This helps with movements and physics, it also reduces artifacts like random cutoff. There are still some "stretch" artifacts in some situations.

v1 readme

A model that should be better at animating furry porn, that's pretty much it. It's not good at txt2vid, so I don't recommend that, maybe this could be improved by training on images as well.

This is mostly a proof-of-concept to demonstrate that a lora can be made for Wan 2.1 Fun 1.3b Inp, and I think it shows that this is indeed the case.

Btw, generating short videos (<1.5 sec) with img2vid at a slightly lower resolution lets you generate a video in about a minute on an rtx 3060. Doing the same with the 14b model takes me more than 10 minutes. The 1.3b deserves more love.

Usage

Most importantly, use Wan 2.1 Fun 1.3b Inp, with img2vid, as using regular txt2vid is not going to give very good results due to the lora not being high rank enough, or even trained enough. While some concepts will be visible, it will not produce very good quality outputs.

When testing, I noticed that just prompting naturally usually yields the best results, however, there are a few things that have been tagged a few times in the dataset.

Note that neither speeds or depths are going to have much impact, likely due to some issues described in the training section.

Positions
- The model was trained on cowgirl, reverse cowgirl, missionary, blowjob, deepthroat, some teasing as well
Perspective
- Mainly "viewed from a first-person pov perspective", "viewed from the side". Other descriptions should hopefully work.
Speeds
- Speeds are written like "[speed] thrusting" or "[speed] sucking"
- Available speeds are: "slow", "moderate speed", "fast" and "very fast"
Depths
- Depths are written like "[depth] thrusts" or "[depth] sucks"
- Available depths are: "shallow", "moderate", "deep" and "balls deep"
Features
- Jiggling breasts (Seems to be pretty noticable in generations)
- Jiggling ass

This lora has been tested with images generated with Novafurry and Willy's Noob Realism. As shown in the preview videos. It should work on outputs generated from whatever model though.

Training info

This model is a LoRA painstakingly trained on a single rtx 3060 for a total of 40 epochs on a dataset of about 45 manually tagged clips of nsfw furry content.

The first ~36 epochs were trained with varying framerates, assuming diffusion-pipe doesn't fix that, I then re-encoded the dataset to use 16fps, and trained 4 more epochs, this seems to have made the motion a little better, overall, I'm still not happy.

The dataset was scaled to resolutions with pixel counts similar to 400 pixels, at 24 frames. This still used too much vram, so I used a block-swap of 10, I was able to train at about 2 epochs per hour.

I used diffusion-pipe for the training, since I don't have a budget for anything I trained locally.

The model seems underfitted for txt2vid, the lora rank is also only 32, if I were to train it again, there are a few things I would do differently, namely:

I would start by training on images, so the model can get a better understanding of what anthros look like
I would retag the dataset, going over each entry multiple times instead of just once, since I feel like I might have missed some things
I would use a higher rank for the lora, as I believe 32 might be a bit low for such a broad concept
I would make sure the dataset is already in the correct framerate, as I noticed there was not much movement except with some less commonly used tags, which might be caused by it being effectively in slow motion in the case of high fps videos

While this was trained on Wan 2.1 txt2vid 1.3b, it is intended for img2vid using https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-InP, as I have noticed no additional training is needed, and Wan 2.1 txt2vid 1.3b loras will work properly on Wan2.1 Fun 1.3B InP. I hope this information helps others in the future.

I am overall not happy with how this turned out, but will likely retrain this model from scratch in the future, when I can put some money into a cloud gpu provider or similar to train faster without preventing me from doing other things.

Yap yap yap, go try the model or something

Description

A merge using v3 e70 at 70%, and a newly trained lora at 30%. Should be an improvement over v2 and be more consistent and dynamic.

FAQ

Comments (21)

7748358Apr 18, 2025

CivitAI

Super impressive work! The speed and quality are very nice. However, I only managed to get one good result at the beginning (can be seen below) and all other results after that have been pretty broken for some weird reason :(

Not sure if I am writing the prompts wrong maybe. Would be cool if you could included the prompts you used on your images so there are some more examples :) and maybe your workflow if you're on comfy?

Author

Apr 19, 2025· 1 reaction

There are some tips for promoting in the model description, 3d is still more consistent. The example videos were prompted with natural language loosely based on the example prompts, including the position, thrusting, etc is also a good idea.

nropaiMay 6, 2025

@mylo1337 great work! I've been getting some great results from this, but also some random garbled stuff (non furry). I appreciate the tips for prompting, but a couple of full sample prompts (based on your vids) would help even further on how you build these together.

DepurtureApr 19, 2025· 1 reaction

CivitAI

You can always trust furries to deliver the goods!

KytraApr 19, 2025· 1 reaction

I'm pretty sure the goods are being delivered TO the furries here :)

LatteLeopardApr 19, 2025

CivitAI

I use the 14B model (FP16, tried Q8 a few time) and if I provide a photorealistic furry image I still maintain results like what you show.

So my question is.... is this like a guidance thing? Does it help with maintaining good anthro anatomy?

Or is this for 1.3B because it has trouble with the complexity of anthros?

Author

Apr 19, 2025

Mostly for getting the motions right, without breaking the anatomy as well (although that's far from perfect). The 1.3b img2vid tends to warp things more than the 14b, I wanted something that I could run locally without it taking 10 minutes for a 1.5 second video, which is the case for the 14b for me.

6927513Apr 19, 2025

the 14b and 1.3b struggle with anthro especially if its not photorealistic

LatteLeopardApr 20, 2025

@mylo1337 Oh wow, I didnt know 1.3b ran faster, but I guess that makes sense. I think it takes me about 10 minutes for a video to go through generation -> Upscale -> Interpolation.

I set up a scene in the morning or afternoon then go to work, gym, or whatever else im doing. When I get home I check to see the best ones and prune the shitty ones. and btw it takes this long on a linux datacenter run server with a 4090rtx,

Couldnt imagine what its like waiting for this on anything less than 24gb or ram, so yeah I see why you mighopt for 1.3b

honestly for most of us making porn with i2v, maybe 14b model is unnecessay. Especially seeing the videos you've posted. They're on par with what I was getting. If you upscaled + interpol, it would look 100% exact to the 14b quality i was getting.

LatteLeopardApr 20, 2025

@basedbase Maybe a bit, the worst ive seen is it gets like 3d-ish.

6927513Apr 20, 2025

@LatteLeopard yea I can generate in 2 minutes using a rtx 2070 and 30 seconds on a 3090 on the 1.3b

6927513Apr 20, 2025

@LatteLeopard currently also working on a furry nsfw 1.3b lora works very well for standard 2d seeing best results from combining 2 loras at different strengths. Need to merge to a single file so hopefully it will be ready to upload soon. Took 130k steps and 5.6 days to train the current one of a 3090.

6927513Apr 20, 2025· 1 reaction

CivitAI

How were you able to merge your lora's? I have been testing with one of my loras at 0.7 combined with the other at 0.3 and I get very good consistent results but I cant seem to find out how to merge them at that strength into one file.

Author

Apr 21, 2025· 2 reactions

In comfyui, load your wan base model, apply loras at the desired strength, subtract the base model from the model with the loras applied, then extract and save lora (I think this is an experimental/beta node, so enable those in the settings). You might get a warning in console about a missing weight, and there isn't a progress bar. But your GPU usage should indicate that it's busy. Once complete, there should be a lora in your output folder.

OarebtApr 21, 2025

CivitAI

I don't see a lot of 2D post here, but it does works extremely well for 2D image, it give a lot of great motion without much effort.

If you are aiming to continue improving this LORA also for 2D, the open mouth movement for 2D image almost always looks terrible for me. (Maybe it is my prompt that need improvement?).

A simple workaround if other have this problem at the moment is to add "he/she is mute", and the character will almost never open its mouth.

6927513Apr 26, 2025

CivitAI

Just a heads up they deleted all of your videos since it had no metadata.

Author

Apr 27, 2025

I'll add metadata tomorrow I guess. They're just hidden, not deleted.

TurboCoomerApr 28, 2025

CivitAI

how long did it take to train on yours 3060? and you also said that t2v 1.3b models are also compatible with fun, are you sure about that? without convertion or anything?

Author

Apr 28, 2025

About 24 hours until I switched to runpod iirc.
Loras, not models, are compatible. However quality wise it won't be nearly as good as a lora trained natively on the target model variant. fun inp has a few additional layers for the image conditioning, however the base model weights have the same names. And the wan fun inp model used the official wan model as a starting point, so most of those understandings are also still there.

TurboCoomerApr 28, 2025

@mylo1337 thanks for your answer! tried few 1.3b loras and yes they do work with fun. Sadly theres not so much aviable and guess yours is the only one viable to do sex. Gonna try it again with new 1.1 fun model

rift1000Jan 26, 2026

CivitAI

Works well with anime and humans.

LORA

Wan Video 1.3B t2v

Download (Beta)

Details

Downloads

889

Platform

CivitAI

Platform Status

Deleted

Created

4/18/2025

Updated

4/22/2026

Deleted

4/16/2026

Files

furry_nsfw_ultimate_70-30.safetensors

Size:

388.49 MB

SHA256:

5415dbca281a16169a2838c8bf7554dc54def7c719b7b3cf423c1a0a6ac65c07

Mirrors

Huggingface (1 mirrors)

furry_nsfw_ultimate_70-30.safetensors

CivitAI (1 mirrors)

furry_nsfw_ultimate_70-30.safetensors

TensorFiles (1 mirrors)

furry_nsfw_ultimate_70-30.safetensors