Please read through the entire description (might need to be expanded) and the version change notes as they cover a lot of information about basic use cases and limitations. The description is always focused on the latest version. Thank you!
This is an SD 1.5 LoRA for the character Lillie / Lilie from Pokemon. It works for models based on SD 1.x only, it will not work for models based on SD 2.x or SDXL!
Example images were picked from base resolution txt2img results + the FreeU extension (https://github.com/ljleb/sd-webui-freeu) enabled with default settings to further improve results (see last example images for a comparison of FreeU disabled vs enabled). They were then re-created at 2x resolutions using txt2img with moderate Hi-res Fix settings (upscaler Lanczos, denoising strength 0.4). This improves faces and other details that are almost impossible to get correctly and consistently at base resolutions (limitation of the technology) while still giving a realistic impression of what it looks like. Base resolution results will have more distorted faces and less detail.
Please see the version change notes for the training and example image generation models as well as the used weights as they might change between versions. Remember that you might need to adjust weights to best suit your use case!
Remember to add the tag lillie \(pokemon\) (with backslashes intact) to your positive prompt.
The training set contained Lillie's two primary signature looks, the "braids look" from the start of the game and the "ponytail look" from the rest of the game. If you want to focus on one of these looks, you will need to put combinations of the following tags in your positive or negative prompts:
Braids look:
dress
duffel bag
hat
kneehighs
slippers
twin braids
Ponytail look:
ankle socks
backpack
mary janes
ponytail
shirt
skirt
You may also be able to mix these looks but I have not tried it. I might look into separating these into different trigger phrases in the future, not sure yet.
As the training set contained a few images with floating hair, you might need to add floating hair to the negative prompts if not desired.
Known Limitations / Problems:
The shoes, socks and bags were not present in a lot of images and not a focus for the training so they sadly will not be consistent at all.
Eyes and hair had different colors (blue and green) in the training data, better to specify explicitly. And maybe try adding multicolored eyes in the negative prompts. Still might come out a bit weird, I need to improve the tagging.
The blue and white parts of the clothing, especially the dress, might get mixed up. No idea how to fix this yet. It might help to put e.g. white dress in the positive prompts.
The hat really loves to appear everywhere, so put hat in the negative prompts, maybe even with additional weight.
Portraits are currently a bit of a weak point, with limited expressions and a fair bit of color bleed. For certain portrait prompts, you will sadly have to get very lucky, especially if they also try to remove the hat. I might be able to improve on this in the future by including a better selection of portrait training images.
Description
There are detailed changes below the next paragraph, you might need to expand this version changes box!
I'm uploading this "WIP version" of v6 as I'm not sure when / if I'll have the time / motivation to properly finish it up. It is roughly equivalent to v5 with the changes to the training process from v7 of my Dawn LoRA. It already fixes quite a few issues with v5 and should usually perform better. But it is not at the same level of improvement yet. If I'm not doing a "proper" v6 with major dataset changes, I might simply rename this version in the future.
Most of the changes (look at LoRA metadata for more details):
Updated to much newer Kohya scripts version (>3 months newer)
Updated training images
Switched to keep original-ish aspect ratios (slightly cropped and resized to compatible SD 1.5 resolutions) where it makes sense instead of forcing squares
Added regularization images to actually make trigger tag work correctly
1 regularization image per training image
Generated by the base model at the same resolution, with the same tags (minus the activation tag)
Currently at half loss during training as they had a bit too much influence otherwise
With this, the LoRA now reverts back much closer to base model knowledge without the activation tag (which is correct!)
Might care a bit more about the positioning of the activation tag now as it was trained with "keep tokens" to keep the activation token at the front when shuffling
Normalized repeats to 1 (only using different values if ever in need of balancing datasets) and learning rates back to defaults
Compensated with different epoch settings
Reverted back to training at 128 dims and a resize down to 32
Results were better across the board and the resizing also removes a bit of noise as a bonus
Used NO dynamic resize method as results for [email protected] and sv_ratio@20 did not seem too different from or better than a simple resizing
Added training warmup of 10%
Not sure if this had much impact, might remove again in the future
Added "scale weight norms" with value 1 during training
Supposedly helps against overfitting and might make LoRAs more compatible with others
Used FreeU extension for example image generation to further improve results
Recommended weight: 1
Training model: Anything V3
Example image generation model: AbyssOrangeMix2 - Hardcore