Beer and coffee keep my motivation up 🍻
Please read this description fully before you take this model for a spin. ↓
This is an alpha version of my "forever" project. A proof of concept for now, Project Men is a Stable Diffusion model with a male focus capable of both realistic photography and anime/illustrations.
512x768 aspect ratio bucketing.
70,000 images in the data set of high-fidelity photography, Blu-Ray film stills, anime, and illustrations.
Captioning done with GPT-Vision (when it doesn't balk) followed by high certainty booru tags.
If filtering prevented captioning, booru tags were used exclusively. In such cases, the image either retained its original tags from danbooru or e621 (with "anime" prepended when relevant), or I applied SmilingWolf's booru interrogators, also prepending "anime".
While there is NSFW data included, it wasn't the focus of this version. What is there exists almost exclusively as booru-tagged captions. As future versions progress and I refine the dataset, I aim to caption everything using natural language.
I would advise caution and testing when trying out different LoRAs on this model. I make NO guarantees as most are complete mysteries as to what they were trained on. On that note, I'm also skeptical of negative embeddings as well and don't recommend their usage unless you train one for this model specifically.
There are no quality or aesthetic tags at this stage, but I hesitantly recommend starting your prompt with "high quality", followed by what you'd like to see, and ending with "detailed, realistic, fine textural details". This seems to work better for photography than for anime or illustrated subjects. Feel free to explore quality modifiers and let me know what you find. Just make sure to keep your description at the front and add tags slowly to ensure they achieve the desired effect.
Recommended Negative Prompt: "low quality, bad, ugly, simple background, artist name, signature, watermark, username, copyright name, text, web address, url, speech bubble, censored, bar censor, mosaic, faceless"
To elaborate, "low quality" has a measurable impact even without explicit quality tags in the training. The rest are Danbooru tags for generally undesirable concepts, though their impact can vary. For example, it's challenging for a model to consistently learn what a "watermark" is due to its many variations in text, position, opacity, etc. Still, 1.x models tend to perform better with some form of negative prompt. Feel free to suggest your own!
The distinction between "anime" and "illustration" is unclear at the moment, as I work to find better interrogators/vision models for style details. For now, almost everything illustrated is labeled as "anime". This raises the issue of defining styles, so I've included artist tags such as "anime, ????? (dopq)" or "anime, artisticjinsky". Some styles work better than others, and I make no guarantees. I believe this will improve with more data and quality tags. There are exceptions where the artist's data was SFW, allowing proper captioning, such as "an illustration by Syd Mead". I aim to fully caption more data this way when possible.
The goal is to transition from using booru tags as primary captions to natural language captioning, creating a model capable of many different styles and mediums. My natural language captions are constructed as phrases or small sentences in a comma-delimited list, with the most important aspects placed first, typically relating to the direct subject of the image. Original booru tags and high-certainty tags found by interrogators will be retained in the future as the data set evolves but at the end of the prompt to carry less weight.
While captioning for the next version is in progress, because of compounding unfortunate life events I don't have a timeline for this project. We'll see how it goes.
If you use this model as part of a mix or host it on a generation service, please mention and link back to this page (especially if you're making money off of it)
Description
FAQ
Details
Files
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.







