EDIT: update comfy-ui custom node: https://registry.comfy.org/publishers/ttddrr11/nodes/animadualkvstyle
test something to more easily overfit the style into anima model without text conditioning
the architect and code was whole heartedly vibe-coded with LLM, basically just mimic IP-adapter to add another KV path in DiT blocks, but instead of using image encoder for conditioning, we just learn the kv directly, and a light weight bottleneck projection layer to extract the style features
code based on commit 068bcd7 of kohya-ss/sd-scripts repo on github
dataset description of each model:
miyajima_reiji_style_dual_kv-000009: manga cover image, take on danbooru (low quality, 14 images)
itachi_3dt_style_dual_kv-000012: a mixed of arts found on danbooru and coloring pages from manga (mixed of low quality and average quality, 14 images)
dramus_style_dual_kv-000008: edit and crop out manga images (average quality, 298 images), colored cover (high quality, 2 images)
all images were resized to 1024x1024 anyway and even when training on bad quality the model don't output jpeg artifacts so seems like it doesn't affect much (maybe due to low parameters count?)
the model can overfit a style quite easily, however it still affect the prompt activation and guidance (refers to examples, all were set to seed 42). Prodigy scheduler has models overfit faster than training with AdamW8.
Trained with command (cmd windows)
accelerate launch --num_cpu_threads_per_process 1 anima_train_custom_style.py ^
--pretrained_model_name_or_path="models/diffusion_models/anima-base-v1.0.safetensors" ^
--qwen3="models/text_encoders/qwen_3_06b_base.safetensors" ^
--vae="models/vae/qwen_image_vae.safetensors" ^
--dataset_config="datasets/{DATASET}.toml" ^
--output_dir="output" ^
--output_name="{OUTPUT_NAME}" ^
--save_model_as=safetensors ^
--num_style_tokens=8 ^
--network_dim=64 ^
--learning_rate=1.0 ^
--optimizer_type="Prodigy" ^
--attn_mode="flash" ^
--gradient_checkpointing ^
--lr_scheduler="cosine" ^
--timestep_sampling="sigmoid" ^
--sigmoid_scale=1.0 ^
--sample_prompts="datasets/{SAMPLE_PROMPTS}.txt" ^
--sample_every_n_epochs=1 ^
--max_train_epochs=20 ^
--save_every_n_epochs=1 ^
--mixed_precision="bf16" ^
--cache_latents ^
--cache_latents_to_disk ^
--vae_chunk_size=64 ^
--vae_disable_cache ^
--max_data_loader_n_workers=4with `gradient checkpointing` enabled, can be trained on 5060 laptop with 8GB VRAM overnight
DATASET.toml used:
[general]
caption_extension = ".txt"
shuffle_caption = false
flip_aug = false
color_aug = false
[[datasets]]
resolution = 1024
batch_size = 1
enable_bucket = true
bucket_reso_steps = 16
[[datasets.subsets]]
image_dir = "[DATASET_PATH]"
num_repeats = 20 # for low image count, turn up repeats to bash the model until overfit
sample prompts used (multiple subject at different distances)
masterpiece, best quality, 2girls, portrait, close-up, school uniform, serafuku, one girl smiling at viewer, other girl looking away shyly, long hair and short hair, detailed faces, classroom background --w 1024 --h 1024 --d 42
masterpiece, best quality, 2girls, medium shot, standing together, casual clothes, one girl playing acoustic guitar, other girl singing with microphone, music club room, instruments around, dynamic pose --w 1024 --h 1024 --d 42
masterpiece, best quality, 2girls, full body, walking side by side, summer festival, yukata, one girl holding cotton candy, other girl with fan, night market lights, lanterns, crowd in distance --w 1024 --h 1024 --d 42
masterpiece, best quality, 2girls, wide shot, rooftop at sunset, one girl sitting on edge, other girl standing leaning on railing, school uniforms, city skyline background, warm lighting --w 1024 --h 1024 --d 42
masterpiece, best quality, 2girls, long shot, faraway view, fantasy meadow, one girl in mage robe casting spell, other girl in knight armor defending, cherry blossoms floating, dramatic sky, epic scene --w 1024 --h 1024 --d 42Inference using:
python anima_minimal_inference_style_custom.py ^
--dit "models/diffusion_models/anima-base-v1.0.safetensors" ^
--vae "models/vae/qwen_image_vae.safetensors" ^
--text_encoder "models/text_encoders/qwen_3_06b_base.safetensors" ^
--style_weights "output/[STYLE_WEIGHT].safetensors" ^
--from_file "datasets/[EVAL_PORMPTS].txt" ^
--attn_mode=flash ^
--save_path "[SAVE_PATH]"Description
dramus, for training code see main branch
Comments (4)
hello, i try to use dramus style but it dont work, maybe need trigger words?
it needs custom inference script [anima_minimal_inference_style_custom.py] and [anima_train_custom_style.py] found in the zip (also required cloning and set up kohya-ss/sd-scripts), for normal out of the box lora i think you should check out this model instead https://civitai.red/models/1313932/dramus-abukano
@123qu i see, thanks
@GrayLK okay i vibe-coded a custom comfy node that can load the lora now, you can check it here
https://registry.comfy.org/publishers/ttddrr11/nodes/animadualkvstyle
(or github: https://github.com/TheDucker1/animadualkvstyle)
with example workflow here
raw.githubusercontent.com/TheDucker1/animadualkvstyle/refs/heads/main/example.json
it should compatible with anima-turbo too, though turbo make the prompt more rigid
