Dual-Resolution Character LoRA: 512 vs 1024 Training Comparison

Dual-Resolution Character LoRA: 512 vs 1024 Training Comparison - v1.0_512x

NSFW

Woman_512 & Woman_1024 LoRA Models - A Resolution Comparison Experiment

Model Description

This is an experimental release of two LoRA models trained with identical parameters except for one key difference: resolution. Both models were trained on the same dataset, using the same captions, with the same training settings on the WAN 2.2 base model. The only variable changed was the training resolution.

Woman_512: Low-noise model trained at 512x512 resolution

Woman_1024: Low-noise model trained at 1024x1024 resolution

Purpose of This Experiment

The goal is to provide a direct comparison of how training resolution affects LoRA performance on the WAN 2.2 model.

Training Data Details:

Source Image: A single AI-generated image of a fictional woman character

Dataset Creation: The single image was diversified into 24 variations using Qwen Image Edit, creating different poses, expressions, angles, and backgrounds

Captions: Automatically generated using JoyCaption

Character: A completely fictional woman - not based on any real person

Trigger Word: No special trigger word is needed. Simply include "woman" in your prompt.

Description

Woman_512: Trained at 512x512 resolution.

FAQ

Comments (22)

rickets_xxxDec 19, 2025

CivitAI

notice a difference? wanted to do some 2048x2048.. but on an h100 its a bit slow..

goldennyks76

Author

Dec 19, 2025

As the resolution increases, it generally preserves more detail and captures small details better, but if the dataset is well prepared, higher values are generally not necessary. I don't know if 2048 training is possible, but if you absolutely want to train at a high resolution, even 1536 will give sufficiently good results.

cooked_like_a_gooseDec 19, 2025

There is no reason to train any LoRA for Wan 2.2 at 2048x2048. If you are training a face 256 is perfectly fine.

rickets_xxxDec 20, 2025

@shake_em_on_down what if you ONLY want to create images with the wan model(s)? just curious. (thats really all I do... mostly). thanks for the replies!

cooked_like_a_gooseDec 20, 2025

@rickets_xxx - when it comes to faces... I have never treated the data or training process any different for Wan 2.2, and all of my LoRAs do t2i just fine. The LoRA is learning the math of that specific face and it can generate videos or stills equally well.

AlArt84Dec 19, 2025· 1 reaction

CivitAI

Interesting Idea.
Have you noticed any benefit or degradation when using the lora's together in tandem with one image?
As when looking at your images in the galleries, it seems the 1024 focus a lot more on the face and general body including outfit, where the 512 seem to have a broader vision or scope when it came to incorporating the character in a scene.

Would probably need more testing. But looks like a interesting experiment.

goldennyks76

Author

Dec 19, 2025· 1 reaction

I agree with what you said. I think these differences also show that as the resolution increases in LORA training, the importance of the dataset and captions used increases even more.

The dataset I used wasn't perfect; there were many repeated poses and similar angles, and the captions were used as they appeared.

By the way, I also tried using both lorals together in a single image; it produces a result that is exactly in the middle of the two, but I can't say whether it has any advantages or disadvantages.

playtime_ai_Dec 19, 2025· 2 reactions

CivitAI

You have these listed as Wan 2.2 models... Are they high noise or low noise models?

JellaiDec 19, 2025· 3 reactions

CivitAI

Thank you so much for doing this experiment. I think people already know that high noise can be trained well on low resolution videos, but many people still thought you needed to pair that with high resolution training for low noise.

It's great to get some confirmation on a difficult visual to train, specifically doing the mesh top, but also trying to see if you can get coherent nipples under that with high resolution training. It seems from this that the extra resolution helps a little in some of the images, as there isn't a single image in low resolution that has coherent nipples, but there are coherent nipples in about a third to half of the high resolution version images. Overall, I wouldn't judge it as worth the extra resolution in training, unless the very point was the mesh top itself.

But even if the point were the mesh top itself, you could have some nipple closeups and probably get that back with low resolution.

lechuck777Dec 19, 2025

imho he didnt trained the clothing style directly, also not the nipples. The difference between them is, i think, only a seed thing. Different lora means different noise in the same scenario. Maybe would a different seed, producing better nipples also with the 512x lora.

JellaiDec 19, 2025

@lechuck777 Well, a lot of different images were made, and none show coherent nipples. But yeah, I guess anything is possible with a seed somewhere out there. We can say maybe about anything, but I am going off of what is shown, since it's shown as a test. I'm making observations based on this test. A test that is put up to show how little is lost with lower resolution. And it's surprising how little is lost. It's a great test that convinced me to stick with low resolution.

And yeah, if he trained it specifically as a clothing lora, he could make up for the resolution difference with closeups, but I'm not sure why you're saying he didn't train it as such, when I acknowledged that in my original post, along with how to mitigate the issues if you were to approach it that way.

goldennyks76

Author

Dec 19, 2025· 1 reaction

@lechuck777 This isn’t a clothing LoRA — it’s a character LoRA. I’m not that experienced in this area, but when writing captions (background environment, stage layout, lighting, camera angle, composition, and the subject’s general posture or movements, etc.), no specific details about the clothing or the character were included. The goal was to learn both the character and the clothing.

I’m not sure how correct this approach was; maybe I should have added tags for the character’s clothing. However, that would have required using the clothing trigger tag every time I wrote a prompt.

There are no close-ups in the dataset to capture micro-details. As @Jellai mentioned, adding close-ups could produce better results at 512 and reduce training time by 3–4×.

However, since the dataset was created using Qwen Image Edit, any attempt to edit images for close-ups disrupted consistency or altered them entirely. I tried generating close-up images, but they were too distorted and deviated too much from the main images, so I didn’t include them in the training data.

By the way, I don't think the effect of having different LORA files will be that significant. They were generated with completely identical settings in the visuals, and the results are nearly the same. Try increasing the seed by in any model and see how much the image changes—there’s no comparable change here.. There might be an effect, but it's not like the seed.

There's something that generally doesn't change at 512 distortion in the chest and nipples, inconsistent clothing, the face not resembling the original in close-ups, or the model not learning the character's nose ring at all. You can download it and try it yourself.

There are no such issues at 1024 resolution, but I think that if close-up images that can solve these issues are added, there's no reason not to train at 512.

GlowingGuardianGirlDec 19, 2025· 5 reactions

CivitAI

Did you do the same for same resolution but different ranks? Base model can handle details, and lowering the rank can drastically reduce filesize+. Could be great to compare ranks at same rez.

7180347Dec 19, 2025

what rank is the sweetspot for you guys so far for facial likeness?

cooked_like_a_gooseDec 19, 2025

@olivereads38255 - my methods make people angry, so take it with a grain of salt. I use dual-mode in musubi-tuner and train a single file. I use 16/16 and train at 256,256 and produce commercial quality LoRAs. Facial likeness is a LOT easier than most people know. I share everything so hit me up if you have questions.

7180347Dec 20, 2025

@I_dont_wanna_grow_up that's insane. queue anakin power meme lol

could it work on my tiny 12gb? please lmk if there's a tut or a toml file i can tweak!

cooked_like_a_gooseDec 20, 2025

@olivereads38255 - you can train facial likeness on a 12gb card in a couple of hours with my methods. My demo/test rig is 32gb DDR4 and a 3060 12gb. This is an info dump:

https://old.reddit.com/r/StableDiffusion/comments/1pnljek/looking_for_wan_22_single_file_lora_training/nuaoq34/

If the dump is not your style, we can chat one on one, or you can ask questions here and I will do my best to help.

7180347Dec 20, 2025

@shake_em_on_down wow awesome! i'll give it a shot this weekend. thank you

lechuck777Dec 19, 2025· 1 reaction

CivitAI

I dont see any difference. The only difference is in the entire rendered picture, but its not a quality thing. It is more like you use a different seed for the same prompt+lora.

cooked_like_a_gooseDec 19, 2025

CivitAI

I would like to replicate your results. Can you share your training data?

96857Dec 20, 2025

CivitAI

Could you share the dataset creation workflow? I've been trying on and off for a while but never hit a rate of consistency where I can trust it.

goldennyks76

Author

Dec 20, 2025· 1 reaction

It's not perfect in terms of consistency, but as a model I use Phr00t/Qwen-Image-Edit-Rapid-AIO, and I also use this as a lora: https://civitai.com/models/2182923/qwen-edit-versatile-photo-poses?modelVersionId=2457989

LORA

Wan Video 2.2 T2V-A14B