Bunline 2k/1024/512 - Pixart Sigma - CivArchive (CivitAI Archive)

Bunline 2k/1024/512 - Pixart Sigma - 1024-v0.8

NSFW

PixArt Sigma XL 2 MS: 2k, 1024, and 512 full finetune on custom captions.

INSTRUCTIONS: Place the .safetensors where the original model would go and select bunline.

Favorite sampling settings:

512/1024 models dpm++2s_a, simple, 24 steps, and CFG 3.1, 4.2, or sometimes more

2k model euler, sgm_uniform, 48 steps, CFG 3.5, 5, or sometimes more

Description

FAQ

Comments (9)

nguyentiendat1531999953Jul 17, 2024· 2 reactions

CivitAI

Hello, I am impressed by your model, I wonder if I can ask for details about the data train, as well as how to label the data?

yayaman

Author

Jul 18, 2024· 1 reaction

Hello and thank you. 2k model saw 5 epochs at lr=1e-8 on 30k jpg photos captioned by BAAI/Bunny-v1_1-Llama-3-8B-V . The 1024 model trains much faster and saw 6 epochs of 60k at the same lr (combined with 2k dataset). I've only used the official trainer so far, and found the default CAME optimizer is the best. Using one 4090 24GB. I'm active in the Pixart discord as well as many talented trainers https://discord.gg/rde6eaE5Ta

nguyentiendat1531999953Jul 18, 2024

@yayaman Thank you for your feedback. I'm trying to train with 1k 1920x1080 landscape images but it doesn't converge. It turns out I have to use so many images. Can I ask if you used data you filtered yourself or a dataset. Thank you very much.

yayaman

Author

Jul 18, 2024· 2 reactions

@nguyentiendat1531999953 Others have had good results with under 1k images. Filtered and captioned myself at many aspect ratios. Be sure to keep LR low. I'm doing batch size 12 for the 1024 model and have gradient accumulation disabled because of poor results.

yayaman

Author

Jul 18, 2024· 1 reaction

@nguyentiendat1531999953 One more thing to try is you can set "deterministic_validation = True" in the config, set a pretty small validation step count, and then get a fixed idea of how the image is changing at one seed. Easier to see if it's getting better/worse.

Oh! And my captions are almost always 300 or more tokens (clipped to 300)

nguyentiendat1531999953Jul 22, 2024

@yayaman how to gradient accumulation disabled, set it = 1 ? And what is your lr mention, it is 1e-8

nguyentiendat1531999953Jul 22, 2024

@yayaman And I have another question: since the caption is up to 300 tokens long, how can users type that long? Do you think about using chatgpt or LLM to gen prompt?

yayaman

Author

Jul 22, 2024· 1 reaction

@nguyentiendat1531999953 Yes 1 accumulation step, lr 1e-8 for 1024 v0.8. I'm training 5e-9 currently to test

>how can users type that long? Do you think about using chatgpt or LLM to gen prompt?

I generate captions for the images for training using an automated VLM. Users can type as short of a prompt as they would like and the rest is filled with padding characters.

nguyentiendat1531999953Jul 23, 2024

@yayaman Thank you very much

Checkpoint

PixArt E

by yayaman

Download (Beta) View on CivitAI

sigma

pixart

base model