You beloved tail~ Ready for a full NAI3 experience? (Actually even better)
Full scale finetune of Pony Diffusion 6 with dataset of 1.8M anime pictures::
Unmatched (in opensouse) knowledge missing in original pony and other models
8k+ artists styles (wildcard), few general styles
Thousands of characters simply by prompt
Full color palette, full brightness range (example 1, example 2), great base aesthetics
No annoying watermarks like everywhere else
Unique angles, foreshortenings, fullbody-wideshots or extreeme closeups without any issues, pretty backgrounds as an added bonus
From cutest and lovely things to deepest and darkest fantasies
Best performance with tails concepts for your fox/cat/dog/dragon/... waifus/husbendos
Well, this finetune has amount of training that is enough to make a base anime model. Despite it, existing knowledge (for anime) has not gone but only becomes better. Accurate approach especially for TE training and a lot of high quality natural text captions (about 600k, mainly made with Claude3 Opus/Claude3.5 Sonet) significantly improves prompt control and understanding. "Feels like a new base, not pony (c)".
And yes, unlike the majority of PD-derriatives which is just a reskin or lobotomized merge, not a single lora was harmed merged. You can add your tweakers if needed, merge difference of other favourite checkpoint or whatever, it works just as a good pony-compatible base.
v0.5.0 Changelog
A new training from PD-base with a large dataset using some new approaches with pretraining, main train, refining
Lot of new data
After some black magic in training, now you can get complete black or complete white pictures without breaking compatibility with existing tools, loras, etc. Actually very interesting experience example
Better and more stable base styles, less "burning" for artists
Fixes, improvements, ...
(Dataset cut-off - beginning of July, requests after it is pending and not forgotten)
Features and prompting:
Well, first of all - TE knows a lot. It will try to make whatever you prompt without ignorance like you may use to. No guide-rails, no safeguards, no lobotomy. Shit it - shit out.
Scizo-prompts from mixes where you have to boost tag weights and add extra ones to get at least some response (something like (sunny day, rainbow, ethereal hair, transparent skin, huge breasts:1.9)) will not work. You will get something insane, creepy or unexpected.
At the same time, if you just copy tags from booru picture without manipulations mentioned above, or describe it normaly with combination of tags and natural text - most likely it will be great in very wide range. Stick to original booru tags to get best results. Deepest and darkest fantasies may require some rolling, popular things are very stable.
Basic:
Same as for all SDXL, ~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a and CFG 4..9 (6-7 is best). Highresfix: anyGAN/DAT, x1.5-1.6, denoise 0.5, upscale works best with single tile resolution no more then 3mpx. Highres fix and further upscale will significantly improve quality, details, eyes, hands, feet, etc.
Set Emphasis: No norm in settings of your generation tool if you getting strange blobs or distortion.
If LCM/PCM accelerators applied - use Euler/Euler a samplers, DDIM gives a lot of mess and abominations.
Clip Skip 1 unless you are using loras that have problems with it.
Quality classification:
Only 4 quality tags:
masterpiece, best quality,for positive
low quality, worst qualityfor negative.
Avoid using score_x, source_x, ... etc like in original pony.
In most cases they just make things worse, add noise and mess, brake bodies, fingers, change styles and bring back urine yellow-green filter.
They just make things worse, add noise and mess, brake bodies, fingers, change styles and bring back urine yellow-green filter.
Originally that was definitely not the best implementation of quality tagging including some training flaws and requiring tons of tokens. It became clear that it's better to introduce new tags instead of fixing original. At this point they only bring old triggers without serious improvements.
Negative prompt:
(worst quality, low quality:1.1), error, bad hands, watermark, distortedcorrect according to your preferences.
Do not put tags like greyscale, monochrome, yellow background in negative. You will just get burned images, no need to fix washed colors or "yellow filter" here like you may use to. 3d in negatives is also a bad choose in most cases.
To improve backgrounds, add to negative
simple background, blurry background, abstract backgroundbut do not forget to remove it if you are prompting something with simple.
Artist styles:
Used with "by ", multiple gives very interesting results, can be controlled with prompt weights.
by ARTISTNAME1, [by ARTISTNAME2, (by ARTISTNAME3:0.8),...]or/and
[by ARTISTNAME1|by ARTISTNAME2|by ARTISTNAME3|...]Works best in the very beginning of prompt. Can be used as a wildcard (beware, there is a flaw in sd-dynamic-prompts extension that sometimes wrecks up results when used with batch size more then 1). For majority highresfix/upscale improves quality a lot.
General styles:
2.5d, bold line, smooth shading, flat colors, minimalistic, cgi, digital painting, ink style, oil style, pastel stylecan be used in combinations (with artists too), with weights, both in positive and negative prompts.
Characters:
Use full name tag same like on boorus and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". This extension might be very usefull.
Most characters are known by the name, but it will be better if you prompt their main features, like:
karin \(blue_archive\), karin \(bunny \(blue_archive\), dark-skinned female, purple halo, ponytail, yellow eyes, playboy bunny, fishnet pantyhose, glovesNatural text:
Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.
And yes, it's still based on pony, so it will be worse in IRL concepts, references or some complex expressions comparing to other checkpoints based on vanila SDXL. Check out Tofu, my new model that can manage such things.
Lots of Tail/Ears-related concepts:
tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting,...(booru meaning, not e621) and many others with natural text. Some reproduces perfectly, some requires rolling. Unfortunately In 0.5.0 some may work worse, but other looks better. Also now it have better performance with all kind of tails, not only fluffy kemonomimis.
Brightness/contrast:
You can just prompt with tags or natural text what you want in it should work, like dark night, dusk, bright sun, etc. Black/white background works, but often it gives not 0,0,0 or 255,255,255 like should. Part of this is related to prompts - just check what pictures are tagged with it. And using phrases like (cute girl in front of completely black background) fixes it. Anyway you shouldn't meet any issues with general use, it works just like NAI3, often even better.
Known issues
Well, unfortunatelly there are:
Some artist styles don't work as it should.
(The reason for this is not entirely clear, because in another model with the same dataset they work fine. Probably it is something related to conflicts with PD 1-token hashes or problems with original TE. It can be fixed in future anyway, please report if you find artists that doesn't have decent effect.)
Some concepts require more training (few tail-related, some rare like "dogeza" or memes)
Watermarks sometimes can be found. Mostly it is related to pony-base, but some may be from dataset
Ciloranko is actually opossum LMAO (error in on of cherry-picked dataset)
To be discovered, still WIP
Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.
License:
Pony viral, check the original. Fell free to use in your merges, finetunes, ets. just please leave a link.
Future plans:
Well, a new dataset 2.5 times bigger with better balancing and classification is ready, but any mistake of flaw will cost A LOT. Fixes for current version may be quite soon, but before next big training I'm going to collect more feedback and test some new thing. If you have advices, would like to share your experience, tools or methods for training - you are very welcome.
I'm thinking about adding of some furries in dataset. It may be beneficial for anatomy, poses, concepts, but not that easy because of different tagging system and... wide aesthetic range. If you have ideas how to deal with it, suggestions for good looking/interesting furry artists or can share your datasets - please PM.
Training with natural text tagging (in combination with booru tags) looks very promising even for SDXL, and new large models comes with it out of box. Current local VLM does not have decent performance, COG and Idefics3 are nice but strongly SFW, joycaption hallucinating and almost uncontrollable with prompt, Llava is just dumb, others have similar problems. As for commercials - claude is extremely expensive, gemini has strong censorship, gpt4o is quite stupid for such a task.
So there is a little chance that someday you will see a multimodal llm finetuned with sfw/nsfw anime pictures from the dataset, it should help a lot.Oh yes, here is preliminary version and showcase.Flux - promising, very smart, gpu-heavy and brainwashed even for boobs. I've performed some training where "uncesoring" and little knowledge of anime concepts have been achieved, but it doesn't looks good enough. Write if you are interessed in it. But main issues here are training tools (actively developing, hope will get right full t5 training soon) and about 5-7 times more gpu time requirements for it, so probably it's better to wait for a while.
Any suggestions or requests, join Discord server
Thanks:
Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.
And off course everyone who made feedback and requests, it's really valuable.
Donations
AI is my hobby, I'm wasting money on it and not begging for donations. If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.
Hovewer your money will accelerate further training and researches
(Just keep in mind that I can waste it on alcohol or cosplay girls)
BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c
ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db
if you can offer gpu-time (a100+) - PM.
Description
Major update
FAQ
Comments (106)
Best model by far, amazing work!
tested and its best one so far, i cant say it needs improvements (styles have always been a problem with pony tbh)
NAIv3 competitor is a wild claim 💀
YEEEEAH BABY, NOW WE TALKING!
Maybe you already do this, but you should consider adding gay stuff to your dataset instead of furries. Furry art styles don't translate well to humans in most cases, and knowing how a female crocodile's anatomy should look isn't exactly helpful lol. I know it's not the same as a base model, but I've tested this with loras and gay loras work really well on women.
Sure! Actually new dataset for future versions already has a lot of boys. If you can suggest any series, concepts, specific tags, or may be other source of images then boorus - it will be great.
As for furry - arts often contain a lot of important anatomy details in general both for males and females. Animal traits can be regularized and separated by the model, so it should be only beneficial. But, off course testing is required.
Amazing model, definitely an upgrade from 4.5, спасибо!
It works better for some old artists like avogado6 or bluethebone, but it seems like it forgot huke for some reason. Interestingly, huke works fine on tofu.
Oh yes, not only huke but a number of artists that works fine with Tofu but weak with 4th.
That's a question for me because Tofu and 4th Tail share same dataset with neglectable differences. Slightly different phases and hyper-parameters also shouldn't lead to this. May be it is related to conflicts in TE since Pony's is completely wrekt and you have to bake it a lot to make some things work, but in current version a very "gentle" approach have been used unlike before. Or something else, needs investigation.
This model is pretty good, but the artist tags don't really reflect properly. I would suggest trying to focus on bringing out styles better. Promising model though
I like the model ditching the Pony score tags, but the artist tags seem a bit under baked. Even in the instances where I can tell it's attempting to replicate the artist's look it's just a bit off or not strong enough of a change (Even when compared to base pony's native knowledge in some instances)
Can't wait to try this model, but I am sure it outperforms the prior version.
Nice work! I tried to extract a lyco from this model and the effect on artists is weaker than past versions. For some it has a subtle effect, for others it has almost none, even if raising the strength. Maybe am I missing a step?
Actually there is a chance that extract from checkpoint before final smoothing pass may be more effective. And full weights merge also should be better. Will investigate soon.
Thanks! I did a huge lyco extract now (384 dim) and using specifically artists first in the prompt works a ton better, so many thanks for the model!
Also to add, merging models with add difference using 4th 0.5 as contrast vs pony ( A + (B-C) * M), results are impressive! Even more that using a extracted lyco itself. Probably doing that on autismmix would get pretty good results.
Very promising, though output quality seems to depend heavily on artist tags? Outputs with no artist tags have issues with anatomy and overly bright lighting, but this might just be a skill issue on my part.
Actually artists may give some biases and even improve/degrade anatomy. This is partly due to direct content of pictures, but mainly from basic shading/forms/patterns that help or hinder. I didn't notice any significant issues during testing with or without it, but actually it turns out that checkpoint is used with different style combos or loras in 98% of cases. If you have some strong issues - please share it, will investigate.
The basic style is quite specific, some likes it, some hates for bloom and bright shading. Probably I'm going to make it a bit more flat or smooth in future.
some popular NAI3 artists have not been integrated into the dataset, like:
rei \(sanbonzakura\), tianliang duohe fangdongye, rella, konya, yoneyama mai, wanke, hoping to see them in the future version!
And the artist "chen bin" doesn't work correctly, even the sample on MEGA, may due to the crack TE of pony?
"After some black magic in training, now you can get complete black or complete white pictures without breaking compatibility with existing tools, loras, etc. Actually very interesting experience example"
Is this using zsnr/vpred or something else?
Some combination of ZSNR, pyramid noise and kitsune magic!
Well simple hint - znsr without vpred gives weird results but can form the necessary dependencies and gradients. Noise offset/pyramid allow to shift average brightness but don't have enough range and control. You have to make them become friends, that's all.
As for vpred - does it work with a1111/forge in sdxl? If not - no way, and loras/cnet compatibility will probably be limited. Please correct if I'm wrong, actually vpred version can be made in 1-2 days if needed.
@Minthybasis Understood!
As for whether vpred works on a1111/forge for SDXL - I do not believe it does. It may or may not work in others such as ComfyUI/SwarmUI however, no idea at all there.
So should I be running this model with Zsnr noise schedule in A1111(Forge)? As for vpred, I remember A1111 could do it with yaml config in 1.5, not sure for SDXL though.
Lots of improvements, but hair_between_eyes, very_long_hair and black_hair show a bias to Manhattan Cafe's hair from Umamusume (adding her tag to negs fixes the recurring long_bangs/long_hair_between_eyes)
@Minthybasis After testing, there is a considerable bias towards Manhattan Cafe's bangs when using hair_between_eyes (long strand of hair that goes down the eyes, I don't know how to describe it), even in short_hair/medium_hair. If you add her to the negative prompt, the long strand of hair is gone, and you get the desired output when tagging short or medium hair, probably due to the lack of tagging for such (long_hair_between_eyes)? It's an "obscure" tag, since tag completion extension does not list it. I already requested it for your next batch, but I wanted to let you know.
@blackfuture82729 Things your mentioned before is not forgotten, also I added some balancing rules to new dataset and expanded it to cover more hairstyles. Long_hair_between_eyes is popular enough, but in dataset for 0.5.0 it may be shifted to some characters.
Well, very general biases like some common attributes can be quite difficult to track and solve, hope measures taken will help.
@Minthybasis Thank you for all you do (mostly for free). No model comes close to yours, please keep them coming. I can't wait for the next iteration!
Tested a few artists, most seem to match. Found a mismatch though, not sure where to report artists with no effect. by puge doesn't give the correct art style according to boorus.
Yes, in 4th it has almost no effect, small one in Tofu.
Just write here is enough, will add all his works.
Eh. Seems to massacre hands in exchange for doing things LORA already do. It's like we are regressing.
Well, there are a cases where lora is the best choose like very rare concepts, characters or special styles you like. But in many other they only brings your down by the definition, limiting and making more difficult to achieve desired result due to biases, mutual compatibility and overall performance decrease. Embedded knowledges don't have such problems, and it's also very inconvenient to switch everything every gen.
IMAO best example of regress - schizo mixes with tons of tweakers and alignment which ignores half of your prompt, which are presented as innovation. There are advantages like more simple prompting of generic things and idiot-proof design. But most peoples are so used to it they they don't even realise how many efforts and tinkering they making for some simple things like making head arched back, lmao.
Stagnation in anticipation of new leak/miracle, we've seen it for more then a year with sd1.5 (except few checkpoints). Hope it's different now since we have a bunch of promising open-source models.
it will always be better when a model can do things without adding lora. stacking multiple lora on a model for new concept will just lower the quality of a checkpoints because lora model are fighting with each other.
does "masterpiece, best quality" nullify the artist tags for y'all as well, or am I just doing something wrong somehow?
No, they can add "style bias" but usually works fine along with artist styles. You can upload your generated picture with metadata somewhere and post it in reply to analyse.
Did you remove some artists from the tags/dataset for 0.5.0? I was trying as109, and while 0.4.5 gets the style really well, 0.5.0 just seems to completely ignore it. I'm just wondering, cause you're still listing him in the wildcard
That's strange because as109 worked okay and it appears in test grids. Could you please upload your image where it is prompted but doesn't give decent style?
@Minthybasis Sure. Catboxed, cause of comfy workflows, so you can take a look:
0.4.5 version: https://files.catbox.moe/0hqymj.png
0.5.0 version: https://files.catbox.moe/8sl47g.png (and another seed, which has better anatomy, but also doesn't get the style https://files.catbox.moe/7zyd5b.png )
Same prompt, same setup, but the 0.5.0 version just doesn't get the style.
If you say it worked on test grids, it might be because of the character Lora, but if that's the case, I wonder why it worked with the 0.4.5 version
@jyrrata Looks like it is related to loras, negative embeddings and prompt. Without lora (or with other), when artist tag is placed first and with pruned negative it works okay. Don't know what affect the most, needs testing. Here are example from comfy and grid from a1111, may be it could help.
Anyway if you can't or don't want to change that things - at least consider to place artist tag first and try to increase tag weight, that definitely should help.
Version 0.5.0 has more "gentle" approach in training, that's probably why it has lesser effect in that combination, despite being very distinct without. Also may be as109 appears to be not the best in 0.5.0, will check it.
@Minthybasis Thanks for checking it out. Yeah, it seems to have just been that the artist tag wasn't first. That single change fixed it for me. Honestly, since it worked so well in 0.4.5, I didn't even consider that that could have been the issue. But now I know that 0.5.0 requires it.
https://files.catbox.moe/0lyn3j.png Everything the same as before, just the artist tag first. Seems like the Lora isn't the issue, which is good news for me.
Also, the embeddings probably didn't play much of a role, cause I didn't actually have those embeddings installed atm (I just copied the prompt and forgot to remove them). Thanks for reminding me to take it out of my negative prompt though.
Great work as always, still my favorite model.
It's now much easier to get exactly image you want compared to previous version.
I noticed model has problems when trying to make femboy/josou seme/shota as doms, it's always trying to subsidute them for girls.
hope to see more artists of this type like:
ankoman, d_coffeer18, iyarin, Rapscallion
It's a really good model, but it doesn't reflect the style I want. I think it would be better to focus on the style of artist.
PS. Healthyman and F.s. style didn't work. And if possible, please add style of nezumin (nezunezu).
Thank you :)
Healthyman works okay on Tofu, can try there. I assume some artists don't work cause Pony conflicts
@low_channel_1503 Yes. I'm already using that model and already know it.
But I just want to use it in 4th tail :)
ok i ran multiple test its actually way way better, BUT it needs more training on some artist or maybe some suppression idk.. but artist like abmayo, ohisashiburi etc are bad (maybe its my negative i couldnt find good negatives for them), also the anatomy during interctions tends to get bad and some poses are also funky so need special training on poses.
it needs very less training like only artist and some poses, you can do way more in it that nai BTW infact you can go take commissions after learning ai art with this model. its very good. finally am shifting to 4th tail this model trainer did what no other could.
absolutely amazing model, however it has problem when inpainting with many artist tags
What kind of problem? Some artifacts or style other then expected during inpainting?
@Minthybasis mostly artifact, but I had a case where the art style become weird on the face after doing inpaint
@TOF_enjoyer Could you please upload example with metadata on catbox or somewhere else for investigation?
@Minthybasis
of course, this is original https://files.catbox.moe/jg424m.png and this is inpainted https://files.catbox.moe/y7b9yd.png
you can see artifact and wrong shadow and mainly around the head area,
to not waste your time, I will tell you what I did, I make an image, then inpaint the face and other things 5 times using detailer to mimic A1111 inpaint mask only, I used this back in 1.5 checkpoint and used to inpaint with strength 0.5-0.7 without problems, autism I had to bring it down to 0.4, in 4th tail I need 0.2 to not get unexpected result but the areas I inpaint barely gets enhanced makes me wonder why I use that lazy workflow instead of a simple no inpaint one
the images should have the comfyui workflow if you want to check the spaghetti mess
@TOF_enjoyer Wow, epic workflow! Honestly I don't see any artifacts but just unlucky/messy inpaint. Yes, denoise 0.2 is too small to make things better, usually I use 0.4-0.5 for upscale and masked inpaint and it looks okay. I'm not familliar with comfy and got full screen of missing nodes, so it's better to ask: what is the resolution for inpaint, what upscaler (gan/dat/math) for mask, what prompts for inpaint (same/original) are used and is there any controllnet? Just to clarify, is this an unfortunate case where styles look different initial gen and upscaled, or it can be fixed.
@Minthybasis You can fix missing nodes using ComfyUI Manager extension, it can automatically install all the required custom nodes.
@Minthybasis I created this workflow at the first month learning comfy, it is honestly outdated but lazy to update, unlike A1111 where I can put the resolution 1024x1024 for inpaint here after the mask is created it inlarge the image until one of it is dimension become 1024, the other? god knows sometimes it become a bad number, this was the only way to mimic A1111 inpaint mask only 5 months ago, I add Gaussian blur to the mask with Karnel size of 10 and sigma of 10, also detailer node in top of that add mask feather 30, the prompts I divide them to 3 areas, style area where I usually only have style and artist tags as well as camera tags, face area where any tags that can effect the head area is there, and rest area which the rest of the tags , (usually clothes tags and certain NSFW tags) while inpaint the face, the model only see the style and face area prompts, no ControlNet used, I also turned off upscaling nodes while generating example, you can see the hair having incorrect red spots as well as f7cked up bowtie,
@TOF_enjoyer It is said that for sdxl it's important to have a resolution multiple of 32 for each side, or performance/quality issues may occur. Honestly don't know much about it, but trying to stick to that numblers. Upscale of masked area also matters, gan or dat give more clear and sharp picture, but it can affect style. Also, if you want to have some experiments - the best results can be achieved if you maintain like 20-30% of unmasked area around masked object, not just cropping only it without surroundings. There will be less unwanted changes like wrong shadows but greater resolution and more actions are needed.
Well not sure if writen above will be helpful. About red spots on hair - got it, this may be related to colors/brightness/contrast bias for some artist styles, controllnet (like anytest) should solve the issue. Also simply increasing the resolution for inpaint task may help.
@Minthybasis detailer used to produce artifact if one side of the dimention ended as a weird number (sometimes and odd number) but it should have been fixed long time ago, but I will keep that in mind in the future when I rework my workflow, never heard of controlnet like anytest before. will give it a look thanks
Do pony loras work with this?
in my first tests: yes(much better than pony on SDXL and SDXL on pony), but it far from perfect(i say 7/10), for good results need retrain LoRA directly on this model
they work, but I recommend training on this if are the creator of said lora, it is much batter this way
May i ask why you chose base pony 6 for the finetune instead of autismmix confetti? Since the latter has way better anatomy and is more stable overall.
*non author random text: because autismmix confetti have lora inside for made better anatomy and more stable overall? it not a good idea to finetune on model that not clear(have merged LoRAs inside), and if you need part of autismmix confetti you can just merge on formula: this ckpt+(autismmix confetti-pony)
Autismix confetti has a good stability and better style then pony but, to put it simply, not a best choise to train on. This applies to almost any mix, exceptions are rare. If 4th had been trained on top of mix, there would have been almost no traces of autism benefits, only weird problems and flaws. However, you can merge the difference (link in suggested resources), or use more complex and proper way of merge to get it's features.
@Minthybasis Thank you for clarifying.
I'm liking the new version, as well as the merge with confetti. Pretty stable poses, but some artists just do not work properly. An example would be 'by healthyman', which works in Tofu, but not well in 4th tail. Probably something caused by Pony?
Yes. Going to bake them more soon, hope it will solve. Also will add pictures for requested/reported artists.
@Minthybasis Good luck
I'm having trouble getting it to work in ComfyUI, only getting noise outputs and nothing more. Tried various samplers with normal and karras schedulers in both my own workflow and the default one (with only changes being added clip skip 1 ("clip set last layer" node set to -1), a separate vae loader node with your vae selected and latent size bumped up to 1024x1024 and various cfg values from 6.5 to 8); different seeds and prompts. Also happens with its merges, but not with the lyco extract on top of vanilla pdxl.
just set clip to -2
use clip skip -2 in comfy
@booboobaabaa @TOF_enjoyer ty, it works. Now I wonder why the author recommends clip skip 1
Probably it can be related to sampers you using or some unusual loader nodes with incompatible extra config. Please share your workflow.
@disastinator about clip skip 1 - it's for a1111/forge, comfy has different settings.
@Minthybasis Ok, thanks. I was confused bc all tutorals say that "clip skip 2 for a1111 means clip set layer -2 in comfy" in regards to vanilla pony xl, so I just extraplolated that onto your recommendations. Guess I should do some more research. And maybe not choose comfy as my 1st generation software xD
As for workflow, it's here: https://files.catbox.moe/qxmzpx.json
As a personal request, I would like to see this tags added:
- hair_on_horn (Shimanto and Aegir from Azur Lane are great examples)
- long_hair_between_eyes (Manhattan Cafe, Implacable/Kearsarge/L' Audacieux/Aegir from Azur Lane)
- sidelocks_tied_back (Yang Guifei from F/GO, Springfield from Girls' Frontline and Marsellaise from Azur Lane)
I think it works, but parted_hair would be another great addition (Sheffield from Azur Lane comes to mind, though she also has hair_over_one_eye), and curtained_hair (I think Musashi from Kancolle is a great example).
Sure, also girls from Kancolle will be added to the dataset in significant numbers.
@Minthybasis The GOAT of model finetunes! If you need help testing epochs, hmu.
@Minthybasis I forgot about some bangs concepts:
wispy_bangs
choppy_bangs
fanged_bangs
I finally got around to trying out the latest version of your model. As always, it's at the top of my list! From the looks of it, my previous requests weren't included, considering the training start date. Right now, this model can generate almost anything, except for some really niche dark themes or super obscure tags with low popularity. The only thing I personally feel is missing is a solid baseline style, something similar to what's seen in other popular models. From what I've noticed, over half of the people don't really follow artists or care much about them, so they’re not too keen on picking a style from 8000 artists based on image samples. Of course, this is just my take, but I feel like that's why models with 1000 LORA merges focused on stability and style tend to be more popular.
Yes, your requests will be in the next. I'm thinking about base style and the way to implement it without breaking embedded artist styles, hope the solution will be found.
Also, fell free to share your thoughts about dark themes and concepts, if they have enough pictures and not completely insane - no problems to add.
@Minthybasis From what I’ve noticed, when there aren’t any artist tags in the prompt, the model seems to apply random styles to the image. I’ve seen the style shift quite often when no artist is specified.
As for darker themes, I was surprised to find that it can’t generate a sex scene with both vaginal or anal and tags like fingering/masturbation at the same time. The model either breaks the image or only picks one. I recently saw a LORA that adds this functionality, but I don’t get why the model doesn’t handle it out of the box.
Also, could you add concepts like fisting, cunnilingus gesture, and maybe fix the 69 position? It works pretty badly for two girls, though it’s a bit more stable with a guy, but it could still be improved.
@Minthybasis i suggest taking a few hundred images from unique artist making a set like 200 recent images from artist 1, artist 2 etc then training the artists separately (after base training, kinda like aesthetic training) to make it adhere to artist more.
the model do know artist but breaks when a. artist has more than a thousand images, b. artists has less than 200 images.
also most of these 8k artists have very similar styles so that is why am suggesting on making a dataset for different artist that have unique styles, i can give probably make a list for them ofc manually you have to scrape and tag them. well i am pretty sure you are more experienced in training models than any other person i know. but this approach i have seen in https://huggingface.co/spaces/gustproof/style-similarity appraoch
his dataset is an year old so quite outdated, but contains 10k artists with 30 images each https://huggingface.co/datasets/gustproof/danbooru-artists-10k/tree/main?not-for-all-audiences=true
this will make model stable and contain artist styles properly and at the same time reduce number of images. cel shading etc grouping is required aswell the problem is dataset.
@GromForever Oh, style shifts again, but kind of understandable for this state, very interesting. Looks like not the dataset size but other things matter here. Okay, got it.
Combinations with problems that you mentioned is probably related to stability and amount of training for them. Well, can't claim for sure, especially since every train is experimental in some ways (not a corporation with dedicated peoples and gpu cluster to perform background researches and release the best results after), but it's very likely that next train should solve some of it.
As for concepts - sure.
@EBIX Thank for the suggestion. Actually current algo is a bit more complex, it tries to judge according to release date, user scores, classifier scores, balance between sfw/nsfw (if applicable) and so on. But if artist has only a very few images - all of them except complete trash will be in training. Ofc the best way is to use manually cherrypicked dataset, some of them are already being implemented, but for most it's automatic picking and balancing.
As for "second stage" training - actually for 0.4.5 that kind of approach have been used, well known artist were pruned and then only "best" images used. You are right, I'll should use it more for style improvements.
Very interesting thing for styles! Will dig in more, looks like worth thing, thank you for pointing. If you have some list for a distinct and unique styles - would be great to look at it.
Will find some time and make a discord server this week, where it could be discussed.
Thank you very much for the update! In fact, many do not realize what this model is for, perhaps from an aesthetic point of view it is not at the level of the best models with huge detail, but it does not need it, in fact, it is a huge combine of different styles, artists, because of the huge selection of images, training models and lora's is obtained with much less losses than on standard popular pony models. This model is simply a diamond among open models, which allows you to train with much greater accuracy, diversity and understanding of anatomy. Yes, after it, most often you need to fine-tune the model, but this is not a problem for those who have been in this field for a long time. In general, I wanted to say a huge thank you for such an invaluable contribution to the development of open access models, you do more for the community than you can imagine!
Oh yeah, I almost forgot about the request)
by mikimoto haruhiko
by kajishima masaki
by saotome nanda
by hakumai gen
Pretty good model(7/10),
But LoRA sometimes works and sometimes doesn't.
Everything is fine except for that.
Thank you! Could you please name or share loras that have bad performance? Not sure that LoRa compatibility issues can be solved easily, but it's worth at least to check.
I recommend training new loras with this model as a base, although it is trained in top of pony the network changed enough to break some pony loras
How do I find out the name of the artist this model corresponds to?
Is the only way to do this is to enter all the artists you have in mind?
I wish I could find a list of them on some website..
It's in the first section of the model description...
https://mega.nz/folder/Yb5yyTDR#ZDuklCQXfw-Rd-dsJtcoOQ comparison images
https://files.catbox.moe/zeac9p.txt the wildcard file with all the artist names
Thanks, I didn't realize there was a link at all.
@DraconicDragon Sorry, I was plugging it into the translation tool and didn't realize there was a link.
I created Discord server where you can ask, comment and get the latest news.
>Invite invalid
Ссылка сдохла. Вместес с сервером, потому что слишком много двачеров набижало?
@Sarudakedonanika fixed
I barely have to trick this model to do the wilder stuff I want it to. 10/10
What's the best way to get 'anime screencap' style? Even with prompts like 'anime coloring' or 'anime screencap,' I keep getting results that are too detailed. Even in inpainting, it doesn’t match the simplicity of the background—it places everything right, but it’s too detailed.
Anime screencap as a base general style will be added in next version. However if you are looking just for more flat and less detailed pictures - add flat colors and/or minimalistic to prompt.
Also playing with artist styles is a way to achieve, but due to quantity it might be quite difficult. May be in future I will add a grouping of artist styles by general aspects or common features.
What's your plans for next version?
WIP, there will be at least one more version for 4th. Most of new DS and new train code completed. Just still few things to be done and finish latents conversion. In best-case scenario there will be 3 models - illustrious and pony (4th tail) finetunes and major update for tofu.
@Minthybasis Any hints what styles will be included in next version?
@Rating_Agent Over 22k of artists styles, few general and about ten related to booru tags. Detailed list later.
@Minthybasis omg, 22k new or 22k in total? Either way its incredible!
@Rating_Agent In total, may be more. At least a lot of new unique and interesting styles are expected as well as improvements for existing.
Are you guys planning to make next version pony too or maybe move to illustrious? I find it catching anime styles so good, and the fact that they didn't censor any tags ( pony censored alot ) is really nice to hear
Well, since pony v7 is announced to be released in like 1-2 month - probably going to wait for a new one. I'm just a lone enthusiast, not a team with funding, so the budget is limited. Improve Illustrious-based model or conduct some researches with 3.5 medium looks more worth spending, currently.
Ah, btw haven't you seen my new Illustrious finetune? https://civitai.com/models/950531
@Minthybasis Wow, haven't seen it! Thanks for making it!
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.



















