trained with max 128 network dimension. ~10steps/repeats per each image. the machine is fed with original resolution file instead of compressed social media images. at certain scenario, even the grainy rough brush texture is fully trained.
Description
1 repeat, 100 epoches, saving each every 20 epoches, totalling 5 epoches. cosine with restarts, 5 total cycle, synchronizing with each saving epoches. 1e-4 unet learning rate, zero text encoder learning.
256/256 network/convolution dimension and alpha.
trained on illustrious 2.0 base model.
enabling random crop, original resolution artwork, tagged with wd14 swin v2 tagger v3. tagging threshold ~0.1 to 0.25
the key ingredient is an extra set of manual cropped dataset, the artwork is cropped to eye->face->portrait->upper body, in order of priority, depending on the available resolution, how large the image is.
every piece of artwork in the dataset went through this cropping process, having full resolution data on eyes and faces secures the most important aspect of the dataset; besides the overall artstyle, eyes and faces are the soul of semi-photorealism anime artstyle, bodies are all about anatomy and they all look, should look, similar, but the figurative stylization of eyes and faces sets artstyle apart from the crowd.
it also helps tremendously in image upscaling.
euler_A_cfg++
cfg 1.0~3.0