TRIGGER WORD: "shieldlora"
A LoRA inspired by the style used for Season 1 of "The Rising of the Shield Hero" Anime.
This is a Style Lora, meaning it learns to mimic the style of the Anime above. It was trained on a curated dataset of 1000 automatically AI-captioned frames from the original material, as well as other material, to minimally but meaningfully change minor aspects of the style to protect against copyright infringement by drawing inspiration from other anime too.
The training dataset was curated to maximize frame variation by using CLIP (a contrastively trained model that contrasts on semantic richness), and selecting 1000 frames that maximize variety in the dataset.
This means that this logo can do the following:
Sprites for Visual Novels / Video Games.
Illustrations inspired by the style of the Shield Hero Anime
Fanart - Note that characters need to be trained separately.
Background images
Examples of where I'd use this model (some legitimate uses of mine):
Visual Novel asset creation, though generating on a pure white background, then removing the background in post.
Generating a custom style for a video game / VN that won't get flagged as "AI Made" by the community due to its unique style. For this i suggest mixing multiple styles until you reach something that has it's own personality so people won't complain it looks like slop.
Due to being based on Flux Klein 9B Base, this model has a large diversity of outputs and can understand complex prompts too.
If you want to create characters in a style reminiscent of the Shield Hero anime, you can use the LoRA below.
Training Pipeline:
Curated the frames from episodes by using a scene segmentation model to split the episodes into scenes. The first and last frame from scenes are taken as images.
The frames are then filtered by CLIP embedding them, then sampling to maximize image diversity. The intuition is that this will lead to better generalization. I've used 1K images.
I use a VLM finetuned for captioning to generate 10 captions for each image, none mentioning the style of the anime, or the fact that it's anime at all. This helps the model not sink attention into what it shouldn't, or rewrite previous styles. The intuition behind 10 captions per image is that it increases the capacity of the model to understand various ways of calliing the model, making it more resilient and generalizable.
Finetuned as a LoRA (32 Rank) on Flux 2 Klein 9B - Base.
Tested on ComfyUI with LoRA strength recommended between 0.6 and 1.

Description
Comments (2)
This looks really good - some questions about the training:
- which VLM did you use for the captioning?
- would you mind updating some of the example images with the prompts used, or giving some example captions?
- Did you use a tool like OneTrainer to do the sampling by clip embedding or custom training code?
1. Joycaption Beta One.
2. Yes, i'll upload some more image examples soon.
3. The training itself was done using Ostris' AI-toolkit. The preprocessing step, including sampling most varied images was custom training code.