Still, FP8 + Lightning 8-steps Lora is recommended. If you don't like the Fluxy look, use the DPM++ series with Karras (more steps and a higher CFG are required).
End-of-Life
I guess I've learned enough about Qwen-Image, so further testing feels redundant. This repo will not receive any further uploads or attention. I captioned all images and released them as a sign of thanks for testing the beta releases. I hope it helps you, as it was a great resource for me. Good luck!
6,537 captioned images sourced from CivitAI.
Prompts used for generating images (useless for Qwen-Image).
Captions in vulgar and profane style are in the Captions folder.
Caption Example:
This is a digital illustration showing a fucking intense and explicit scene in a movie theater. A muscular dude with short brown hair is sitting behind a blue-haired chick, who's wearing a blue dress and white sandals, and he's fucking her doggy style. His cock is huge and it's ejaculating a fucking lot, with cum dripping down her pussy and onto the seat. He's got one hand over her mouth, and she's looking surprised with wide eyes. The background shows other dudes sitting in red theater seats, looking bored or distracted. The camera angle is straight-on, focusing on the fucking action. The dude's hand is gripping her tight.Prompt Example:
score_9, score_8_up, score_7_up, absurd_res, hi_res, anime_source, (big man sitting on chair in movie_theater, cute girl on lap:1.1), open pants, hug, anal, pussy_juice, stealth_sex, cum,sundress, smug, surprised, exhausted, rolling_eyes, covering_mouth, detailed face, intricate details, hyperdetailed, very aesthetic, motion_lines, <lora:NAI Smooth Boys Style SDXL_LoRA_20r_20e_8i_nr32_a16_Pony Diffusion V6 XL:0.5> <lora:Concept Art Twilight Style SDXL_LoRA_Pony Diffusion V6 XL:0.7>META4 - Helm [Plz read]
Helm needs to be used with META-4 (Strength 0.6) + Helm (Strength 1).
Qwen-Image doesn't respond well to Booru tags.
This is in line with other BETA releases to figure out how to deal with anime (low detail) and realism (high detail). Qwen-Image is not specialized as illustrious for anime, so much of the anime actions need to be done via LoRA. Although, if you're crazy enough, you can do it via prompt only.
META-4 will improve NSFW details to some degree (Still in BETA, not perfect, but better than what Qwen presents).
Helm needs to be used with META-4 (Strength 0.6) + Helm (Strength 1). Merge it with META-4 using your own settings, following the provided code in the article (link in META-4 description).
Trained on 139 randomly picked images (very limited) with no moderation for testing purposes. therefore, it doesn't satisfy an anime enthusiast right away.
7 epochs, 1000 steps, LR 0.0003 (to see if META-4 can act as a refiner).
A dataset with 2 caption variations (Tags, Vulgar) is provided in case you're interested.
If you make one, please ping me.
Datset source: https://civarchive.com/models/1215490/helm-nikke-sdxl-lora-illustrious-or-3-outfits
Last BETA Releases:
META-4
Please read the article related to META-4
https://civarchive.com/articles/18798/qwen-image-nsfw-lora-notes
This version is a linear merge with tuned weights from four releases, each focused on a specific aspect of the training. While it is still far from perfect, it can be useful in some cases.
DO NOT MERGE version 0.4 with the other releases. Overfitting issues. Overfitting occurs when a model learns the training data too well.
v0.4 BETA
I experimented with the learning rate to determine exactly where overfitting will occur.
There is a better skin tone, but signs of overfitting, as well as bad or deformed genitalia, will occur more often than in v0.3.
v0.3 BETA
Experimented with a more friendly and maintainable prompting style. Use one or a combination of them:
Descriptive Style: "A photo-realistic shoot from above featuring a woman in a provocative pose on a bed..."
SDXL Tag-Based Style: "1girl, long hair, breasts, looking at viewer, open mouth."
Segmentation Style:
Sex Acts: Penetration, vaginal intercourse.
Sexual Positions: Missionary position.
Male Genitalia: Large, erect, dark-skinned, circumcised, with visible veins.
This BETA is all about prompting and testing the results. I've removed anime images from the dataset to save time and resources and to speed up the process.
Next, I'll focus on the details and finding a way to eliminate the current issues with anatomy.
v0.2 BETA
Experimented to find a sweet spot for more detailed genitalia
Used Qwen-Image captioning style (This means you need a detailed description of what you want).
The focus was on experimentation rather than quality, hence the BETA.
New auto-generated realistic images were used. Extreme sizes were spotted, but I didn't filter them out.
Pro: Better output compared to BETA 1.
Con: I did a few tests, and writing a wall of prompts is not maintainable. However, Qwen-Image is detail-hungry, otherwise, it takes over, and in the case of NSFW content, we don't want the model's influence.
Next: I'll try mixing Danbooru tags with descriptive captioning, focusing on vulgar slang, and using a better dataset.
v0.1 BETA
This LoRA is primarily trained on Civitai images for experimenting with Qwen-Image LoRA training. 80% of the dataset consists of anime-based images, while the remaining images are semi-realistic, which will likely dominate the output. (mostly vertical sizes)
Using FP8 with 8 steps Lightning LoRA generates acceptable results. All images in the showcase are the best from two batches
Based on the tests I've conducted, the results are promising. This indicates that we don't have the same level of censorship as Flux.
Prompt Guide: I used Joy Caption, Stable Diffusion style of captioning. Example:
[Update: Upon further testing, it turned out that using the SD style for captioning was a bad idea. I will try a different approach in the next beta.]
"""
nsfw, digital painting, close-up, girl with green eyes, black hair in two buns, red halter top, large breasts, hand grabbing her right breast, nipple exposed, gold necklace, light skin, subtle blush, camera angle from below, looking up, soft lighting, realistic style, detailed shading, hand on breast, suggestive, hand touching breast, breast grab, hand on nipple, upper body, focused on face and breasts, red halter top, bouncy hair, soft texture, high detail, hand on nipple, realistic shading, realistic style, soft lighting, subtle blush, looking up, gold necklace, realistic eyes, halter top, realistic breasts, realistic skin, realistic lighting, hand on nipple, detailed shading, high detail, soft texture
"""
Description
Linear merge with adjusted weights
FAQ
Comments (9)
The best NSFW lora we have at CivitAI atm.
Unfortunately I don't think it works for anime that well, it makes everything more realistic and shiny.
Thanks! It's a work in progress. Those BETA versions were just try-outs. For anime, look at these prompts: https://civitai.com/posts/21581010. I have already trained 2 LoRA models for Helm, and the results are promising, but I haven't shared them yet. You can use the META4 version, set the LoRA strength to 0.53 and try a few Qwen-image friendly prompts. META4 responds very well to this style of prompt:
**
Camera: low angle, wide shot.
Sex Acts: Kiss, breast pressing against chest, erect penis between woman's thighs.
Male Genitalia: Size: large, shape: average, state: erect.
Female Genitalia: Size: small, state: aroused.
Breasts: Size: large, shape: round, nipple details: erect.
Thighs and Buttocks: Size: slim, shape: smooth, involvement: none.
Overall Body Types: Female: curvy, male: muscular.
**
I am working on a NSFW Hentai lora at the moment gonna be training that one roundabout 1k+ images if it turns out fine I might upload it
@trickcrafterites561 Great! It's a very interesting journey. No matter what the outcome is, you'll have fun and headaches. I already tried nearly 1k in the first and second LoRA, but it’s still not good enough. Aim for +1.5 you have 20 million parameters that need to be unlearned and relearned, or else you risk deforming organs, some of which won't show up at all. However, I found that if you train small specialized LoRAs and then merge them together, it’s the most effective way. Otherwise, your GPU runs for days without actually getting something useful out of it. I wish you good luck!
@sweetmax797 wow thats so interesting I btw what do you mean by merging loras? like taking images with different style and mixing them into one dataset? And also im at the point where I make all the captions for the images and I try to be as detailed as possible but obviously that will take a lot of time. how do you caption your images, is there like a faster way? I tried a auto caption workflow in comfyui but the captions are too basic and not suitable for nsfw images sadyl :( so im doing it all by hand.
@trickcrafterites561 Yes, Qwen-Image responds very well if you guide it like an artist. Start by setting the tone: realism, cyberpunk, illustration, anime, etc. Then describe the background from top to bottom, followed by the foreground from top to bottom.It is much better than Photoshop or other AI text-to-image tools.
I use Joy Caption in ComfyUI, but the power of Joy Caption lies in its system prompt. If you provide a good system prompt, it gives you what you want. However, if the dataset is large, you still need to go through it by searching for words you don't want in the captions, so some cleanup is still necessary.
I used a local API for cleaning up, specifically Dolphin-Mistral uncensored, along with Ollama and Python code to automate the process. I trained three small models yesterday and today. If you see the last images I posted today, you'll notice that the anatomy is quite perfect compared to other beta releases. I used 130 images just to see if changing the captioning style would make a difference. In other images, 'rectum' wouldn't show up.
I tried different learning rates and datasets, but it didn't help. However, I used words like 'hole,' 'asshole,' 'rectum,' and 'anus' in different places in the captions for the same image, and it fixed the issue. The lesson learned was that targeting street terms for body parts is more effective, it seems altering those weights is much easier. and for merge, look at the description for META4
@sweetmax797 hey I just read your comment and wondered what did you use Dolphin-Mistral for, what do you mean by cleaning ? I just tried and apparently it's not even good for describing images, I guess it's not of the "Vision" type :/ So far I don't know any LLM better than JoyCaption for NSFW images. Every other LLMs are snowflakes lol
About your remark of using street words i caption, yeah it kinda make sense to teach the model words that was probably not in its training haha
@Tetsuoo Hey, skip to the joy caption part:
https://civitai.com/articles/18998/basic-guide-to-qwen-image-lora-training
Well, 'clean-up' in that context refers to removing unwanted keywords, such as watermarks. Sometimes, vision models refer to a small object in the lower right of the image as a watermark. Phrases like 'this is an image of...' or 'she is performing oral sex' should also be removed. The model already understands concepts like day and night, so there’s no need to mention them. In the case of NSFW content, focusing on the lower body is more important than the environment.
@sweetmax797 ok the cleaning was pretty obvious, I guess I should sleep more at night. Another priceless comment here, "In the case of NSFW content, focusing on the lower body is more important than the environment", that made my day hahaha




