You beloved tail~ Ready for a full NAI3 experience? (Actually even better)
Full scale finetune of Pony Diffusion 6 with dataset of 1.8M anime pictures::
Unmatched (in opensouse) knowledge missing in original pony and other models
8k+ artists styles (wildcard), few general styles
Thousands of characters simply by prompt
Full color palette, full brightness range (example 1, example 2), great base aesthetics
No annoying watermarks like everywhere else
Unique angles, foreshortenings, fullbody-wideshots or extreeme closeups without any issues, pretty backgrounds as an added bonus
From cutest and lovely things to deepest and darkest fantasies
Best performance with tails concepts for your fox/cat/dog/dragon/... waifus/husbendos
Well, this finetune has amount of training that is enough to make a base anime model. Despite it, existing knowledge (for anime) has not gone but only becomes better. Accurate approach especially for TE training and a lot of high quality natural text captions (about 600k, mainly made with Claude3 Opus/Claude3.5 Sonet) significantly improves prompt control and understanding. "Feels like a new base, not pony (c)".
And yes, unlike the majority of PD-derriatives which is just a reskin or lobotomized merge, not a single lora was harmed merged. You can add your tweakers if needed, merge difference of other favourite checkpoint or whatever, it works just as a good pony-compatible base.
v0.5.0 Changelog
A new training from PD-base with a large dataset using some new approaches with pretraining, main train, refining
Lot of new data
After some black magic in training, now you can get complete black or complete white pictures without breaking compatibility with existing tools, loras, etc. Actually very interesting experience example
Better and more stable base styles, less "burning" for artists
Fixes, improvements, ...
(Dataset cut-off - beginning of July, requests after it is pending and not forgotten)
Features and prompting:
Well, first of all - TE knows a lot. It will try to make whatever you prompt without ignorance like you may use to. No guide-rails, no safeguards, no lobotomy. Shit it - shit out.
Scizo-prompts from mixes where you have to boost tag weights and add extra ones to get at least some response (something like (sunny day, rainbow, ethereal hair, transparent skin, huge breasts:1.9)) will not work. You will get something insane, creepy or unexpected.
At the same time, if you just copy tags from booru picture without manipulations mentioned above, or describe it normaly with combination of tags and natural text - most likely it will be great in very wide range. Stick to original booru tags to get best results. Deepest and darkest fantasies may require some rolling, popular things are very stable.
Basic:
Same as for all SDXL, ~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a and CFG 4..9 (6-7 is best). Highresfix: anyGAN/DAT, x1.5-1.6, denoise 0.5, upscale works best with single tile resolution no more then 3mpx. Highres fix and further upscale will significantly improve quality, details, eyes, hands, feet, etc.
Set Emphasis: No norm in settings of your generation tool if you getting strange blobs or distortion.
If LCM/PCM accelerators applied - use Euler/Euler a samplers, DDIM gives a lot of mess and abominations.
Clip Skip 1 unless you are using loras that have problems with it.
Quality classification:
Only 4 quality tags:
masterpiece, best quality,for positive
low quality, worst qualityfor negative.
Avoid using score_x, source_x, ... etc like in original pony.
In most cases they just make things worse, add noise and mess, brake bodies, fingers, change styles and bring back urine yellow-green filter.
They just make things worse, add noise and mess, brake bodies, fingers, change styles and bring back urine yellow-green filter.
Originally that was definitely not the best implementation of quality tagging including some training flaws and requiring tons of tokens. It became clear that it's better to introduce new tags instead of fixing original. At this point they only bring old triggers without serious improvements.
Negative prompt:
(worst quality, low quality:1.1), error, bad hands, watermark, distortedcorrect according to your preferences.
Do not put tags like greyscale, monochrome, yellow background in negative. You will just get burned images, no need to fix washed colors or "yellow filter" here like you may use to. 3d in negatives is also a bad choose in most cases.
To improve backgrounds, add to negative
simple background, blurry background, abstract backgroundbut do not forget to remove it if you are prompting something with simple.
Artist styles:
Used with "by ", multiple gives very interesting results, can be controlled with prompt weights.
by ARTISTNAME1, [by ARTISTNAME2, (by ARTISTNAME3:0.8),...]or/and
[by ARTISTNAME1|by ARTISTNAME2|by ARTISTNAME3|...]Works best in the very beginning of prompt. Can be used as a wildcard (beware, there is a flaw in sd-dynamic-prompts extension that sometimes wrecks up results when used with batch size more then 1). For majority highresfix/upscale improves quality a lot.
General styles:
2.5d, bold line, smooth shading, flat colors, minimalistic, cgi, digital painting, ink style, oil style, pastel stylecan be used in combinations (with artists too), with weights, both in positive and negative prompts.
Characters:
Use full name tag same like on boorus and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". This extension might be very usefull.
Most characters are known by the name, but it will be better if you prompt their main features, like:
karin \(blue_archive\), karin \(bunny \(blue_archive\), dark-skinned female, purple halo, ponytail, yellow eyes, playboy bunny, fishnet pantyhose, glovesNatural text:
Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.
And yes, it's still based on pony, so it will be worse in IRL concepts, references or some complex expressions comparing to other checkpoints based on vanila SDXL. Check out Tofu, my new model that can manage such things.
Lots of Tail/Ears-related concepts:
tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting,...(booru meaning, not e621) and many others with natural text. Some reproduces perfectly, some requires rolling. Unfortunately In 0.5.0 some may work worse, but other looks better. Also now it have better performance with all kind of tails, not only fluffy kemonomimis.
Brightness/contrast:
You can just prompt with tags or natural text what you want in it should work, like dark night, dusk, bright sun, etc. Black/white background works, but often it gives not 0,0,0 or 255,255,255 like should. Part of this is related to prompts - just check what pictures are tagged with it. And using phrases like (cute girl in front of completely black background) fixes it. Anyway you shouldn't meet any issues with general use, it works just like NAI3, often even better.
Known issues
Well, unfortunatelly there are:
Some artist styles don't work as it should.
(The reason for this is not entirely clear, because in another model with the same dataset they work fine. Probably it is something related to conflicts with PD 1-token hashes or problems with original TE. It can be fixed in future anyway, please report if you find artists that doesn't have decent effect.)
Some concepts require more training (few tail-related, some rare like "dogeza" or memes)
Watermarks sometimes can be found. Mostly it is related to pony-base, but some may be from dataset
Ciloranko is actually opossum LMAO (error in on of cherry-picked dataset)
To be discovered, still WIP
Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.
License:
Pony viral, check the original. Fell free to use in your merges, finetunes, ets. just please leave a link.
Future plans:
Well, a new dataset 2.5 times bigger with better balancing and classification is ready, but any mistake of flaw will cost A LOT. Fixes for current version may be quite soon, but before next big training I'm going to collect more feedback and test some new thing. If you have advices, would like to share your experience, tools or methods for training - you are very welcome.
I'm thinking about adding of some furries in dataset. It may be beneficial for anatomy, poses, concepts, but not that easy because of different tagging system and... wide aesthetic range. If you have ideas how to deal with it, suggestions for good looking/interesting furry artists or can share your datasets - please PM.
Training with natural text tagging (in combination with booru tags) looks very promising even for SDXL, and new large models comes with it out of box. Current local VLM does not have decent performance, COG and Idefics3 are nice but strongly SFW, joycaption hallucinating and almost uncontrollable with prompt, Llava is just dumb, others have similar problems. As for commercials - claude is extremely expensive, gemini has strong censorship, gpt4o is quite stupid for such a task.
So there is a little chance that someday you will see a multimodal llm finetuned with sfw/nsfw anime pictures from the dataset, it should help a lot.Oh yes, here is preliminary version and showcase.Flux - promising, very smart, gpu-heavy and brainwashed even for boobs. I've performed some training where "uncesoring" and little knowledge of anime concepts have been achieved, but it doesn't looks good enough. Write if you are interessed in it. But main issues here are training tools (actively developing, hope will get right full t5 training soon) and about 5-7 times more gpu time requirements for it, so probably it's better to wait for a while.
Any suggestions or requests, join Discord server
Thanks:
Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.
And off course everyone who made feedback and requests, it's really valuable.
Donations
AI is my hobby, I'm wasting money on it and not begging for donations. If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.
Hovewer your money will accelerate further training and researches
(Just keep in mind that I can waste it on alcohol or cosplay girls)
BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c
ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db
if you can offer gpu-time (a100+) - PM.
Description
Major update
FAQ
Comments (31)
BASED
Fantastic model. From the start it is giving completely unique angles, objects, and poses. Can't wait to make some more gens tomorrow. Bravo.
Hello, could you tell more details about your training metodology and dataset (esp. natural lang captions)? And about this part "Filtering, classification and some special magic"
Well, basically it's default kohya_ss finetune script, typical lr, adamW, cosine. At different stages of training some parts of the dataset are changing, with saving or reset of the accumulated gradients.
For natural language captions - booru tags are pruned (like [skirt, pleated skirt, short skirt, red skirt] -> short pleated red skirt) and after them natural language prompts are added. Ofc not raw outputs which often have more then 600 tokens, they are preprocessed with llm to shorten and save only meaningful text.
Classification and filtering - raw pictures are classified with system of about 11 beit/convnext models. It includes basic rough/fine aesthetic ranking, re-estimation, correction of wrong results, styles evaluation, some special triggers for messy pictures and so on.
It's quite uneasy task to make visual transformer provide accurate evaluation of anime picture "quality" for different style. You can try popular aesthetic models on huggingface and see that they can call great masterpiece - "unaesthetic/ugly/worst" and some shitty generic old artworks - "aesthetic/best quality". But with system (like rough softmax values from 3 models averaged, then it comes to corresponding more specialised model that estimates high/low rated images, combined with assessment according to various quality criteria and so on) you can get some decent accuracy.
So, pictures for training are picked and captioning according to obtained ranking.
Pretty nice work!
Did a LoRA Extract like https://civitai.com/models/312010/4th-tail-lora-extract (haven't uploaded since I guess you will do it) and works fine most of the time. Have to use >1.0 strength on autismmix for example, but a lot better than 0.3!
damn another model with no askzy, fang etc artstyle
Great model, and the only Pony finetune I've tested that substantially improves upon the base model. But I'm a bit disappointed by the tail concepts despite them being advertised as a headline feature. Tailjobs e.g. are extremely difficult to get, the example picture that has one is basically a fluke. I used the exact same prompt and out of 100+ generated images only one showed an actual tailjob. Multiple tails are also very common and putting "multiple tails" in the negative prompt doesn't really help. 🙁
Thank you. Yes complex concepts require improvements and new approaches, basically because there are a very limited number of high-quality pictures with them. Improving will be one of the goal in next updates.
As for tailjob stability - it also depends from tail type, can conflict with other prompt parts and so on. Like with Kikyou it's quite easy to get it and her double tails is a real cheat that covers extra ones. But with fluffy fox tail it's almost impossible except some cases. Regarding multiple tails, in general haven't seen it frequently if not prompted. Could you please catbox an example with case where you get it often?
As always, great job! A tremendous improvement over the 0.3 version,
Great model, are you still working on 0.5.0 version or 0.4.0 is the last one?
Next one will probably be 0.4.5 and will come in few weeks, it is training right now. You know, with release of SD3 and new announces it's quite pointless to invest into costly hardware to speed up training, so it takes some time.
I do not know how to make a request properly so im leaving a comment, could the artist ro_g_(oowack) be added in the next version? I really like their style and its sad that it is not on the list
Do you have plans to fine-tune SD3?
Definitely yes. Actually I already have hybrid natural text + booru tags dataset for this, but first it’s better to check it and wait for the development of training tools.
@Minthybasis true, i dont think there is need to hurry with sd3
Obligatory "1girl, lying on grass" test
Guys this is awesome! Are you plannig to add even more "special" artists like MUK, Aki99 etc in the next version?
Sure, please request if you are interested in someone specific. Btw special artists performance likely will be improved in next versions, thanks to gold access.
@Minthybasis Thanks! There are some i checked that they are not in current artists dataset list, i can add some more lately:
1=2 ,
bubukka ,
aaaa_(quad-a) ,
abubu ,
agwing86 ,
anan_shokudow ,
aoi_nagisa_(metalder) ,
basukechi ,
musouzuki,
blue_borscht ,
comodomodo ,
dana_(ocana_dana) ,
ebora ,
egami ,
emanonta ,
gemba_(dlfms75) ,
higashiyama_shou ,
higegepon ,
ikaheigen ,
innerkey_(kgfw5338) ,
irotsuya ,
kanbaki ,
kurotsuki_(luowei99) ,
maniacbox ,
matsunome ,
monori_rogue ,
mottogatto ,
murasaki_akiyama ,
naoki_(shibu_asa_ryo) ,
opossumachine ,
ozka ,
pettan_(zeez4743) ,
player193 ,
ponpu_(pumpkinsinclair) ,
sakuran_(ameto) ,
smoog ,
tearontaron ,
tenako_(mugu77) ,
trente ,
v8 ,
yamada_(gotyui)
@Rating_Agent Nice, will try to add their styles. Not in the next release cause most part of training is already done (there are only a traces for 4 of them in dataset) but in next models after it.
@Minthybasis Also for a model this big you shoud create a discord server! Maybe community can help you with collecting dataset, new ideas, and just as a fast way to give people nevs aboul latest updates
@Minthybasis Is there any news for new model? or about discord channel
@Rating_Agent Oh yes, there are. As for discord channel - I'll consider, but managing it requires additional time, which is already in short supply...
I would like to add some artist requests as well, if it's OK:
female focus:
alkemanubis, hoshino fuuta, tianliang duohe fangdongye,
male focus:
thebrushking, lunaflame, manmosu marimo, fairwind, yupa,
The male focus ones have way more art on e621 compared to other similar boorus, for whatever reason. Also, thank you for sharing with the community!
@Minthybasis Thanks for the new version! Wil there be more artists in the future versions?
@Rating_Agent Of course! The ones listed above as well as others already have been processed and added to the new dataset, they will be in the next update.
@Minthybasis Awesome! Cant wait!
Hey, awesome model, as always. I've noted that demon/dragon tails come wrong most of the time (especially demon tails): not attached to body/they look like spaghetti, etc. Any ideas on possible fixes?
Oh, looks like they might be underrepresented compared to cat/fox/dog/etc-tailed. Thanks for pointing, this definitely needs to be fixed, will investigate.
@Minthybasis Thanks a lot, I will drop some cash as soon as the next paycheck arrives ;)
@Minthybasis Hey, could you please add or "long hair between eyes" to the model (https://safebooru.donmai.us/posts/7430267?q=kearsarge_%28azur_lane%29)? Also, any way to distinguish the different types of parted bangs? I have trouble prompting and getting parted bangs like this ones (https://safebooru.donmai.us/posts/7516008?q=ushiwakamaru_%28fate%29+ / https://danbooru.donmai.us/posts/7731407?q=alsace_%28azur_lane%29+) I have had some luck using "parted bangs, blunt bangs", but it's a hit or miss.
Details
Files
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.