I’ve been experimenting with video models, and so far it’s going well. I wanted to start with something challenging since we all know hands are a big deal for vision models—so if I can get something decent out of it, I’ll know less complex concepts will be much easier.
Most of my image-to-video trials involved randomly selecting generated images from the front page and animating them using standard I2V workflows. I’m using 20–30 steps, a cfg of 7, and an 8 step shift for videos ranging from 32 to 64+ frames.
The only trigger word is: "jerk off instruction"; the rest is up to your prompting. I recommend focusing on the action rather than the scene since this is I2V (at least that works for me).
Here’s an example prompt I used in one of these generations—feel free to alter it heavily:
jerk off instruction. a woman who appears to be in a state of arousal. she moves her fist up and down slowly on the right side. Her eyes are closed and her mouth is slightly open, suggesting that she is experiencing pleasure. aesthetic video with slow movements.I’m still fine-tuning the training process, so feel free to share your generations. This will help me gather more feedback to improve further. I didn’t have a large dataset to begin with and trained on relatively low settings if anyone’s curious.
Since this is an NSFW-capable model, please use it responsibly. No real person faces were used in the training.
Description
FAQ
Comments (6)
I like it! I think it's a matter of captioning, maybe?
'moving her fist up and down'. For example, if her fist is not in the right position, it would appear that she is 'not lending money' (if you know what i mean). But it's nice
I tested the same single-person photo 3 times. When the hand was already in the picture, the value strength was 0.6/0.8/1.0, of which the value 0.6 was most consistent with the hand shaking up and down. The value 1.0 was most like the "knocking on the door" action. Tested a photo of two people. When both hands were already in the picture, the value 0.6 was better than 0.8. Tested a single person photo. When there was no hand in the picture, the finished product with a value of 0.6 was good...
These test results are for reference.
good concept, can you do fake air blow job , mimics the motions of performing oral sex using her hand and tongue,
I can give you a new direction to optimize. Search for the Chinese "捣蒜舞" (copy and paste) on TikTok. I'm not sure what the English label is. The background music is "Hoàng Read - The Magic Bomb (Questions I get asked) [Official Audio]". Or you can search "The Magic Bomb 越南鼓卡点舞" on youtube. You will get a lot of materials that meet the optimization of this model. Looking forward to your optimization!!!
works poorly for me, your mileage may vary
my girls never obey properly despite cfg settings\lora strength adjustments
Would love a 2.2 version
Details
Files
joi_i2v_480p.safetensors
Mirrors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
157_joi.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
打手枪_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors
joi_i2v_480p.safetensors