CivArchive
    Aozora-XL Vpred - v0.15 (Alpha) [Vpred]
    NSFW
    Preview 94885391
    Preview 94885173
    Preview 94885085
    Preview 94885107
    Preview 94885148
    Preview 94885334
    Preview 94885206
    Preview 94885192
    Preview 94885215
    Preview 94885285
    Preview 94969722
    Preview 94969972
    Preview 94970057
    Preview 94970080
    Preview 94970136

    Aozora-XL: A V-Prediction SDXL Model

    Aozora-XL is a v-prediction model based on NoobAI v-pred, fine-tuned for improved stability and coherence. It uses a custom training script that allows full/partial fine-tuning on a 12GB consumer GPU, such as an RTX 3060. The training script is available on GitHub at Aozora_SDXL_Training for community use.

    • Never merged

    • No internally merged loras


    Version 0.15 Updates

    This version builds on 0.1 by addressing specific issues in the v-prediction setup. It was trained on the v0.1 base to restore vibrant colors and reduce the slight whitewash effect present in earlier releases. Additional fine-tuning focused on fixing common v-prediction problems, such as inconsistencies in scene composition and detail rendering. It used a dataset of ~50,000 images consisting of visual novel and anime content with deep colors, trained for 5 epochs. Settings included:

    - Base Model: Aozora V0.1

    - Max Train Steps: 250000

    - Gradient Accumulation Steps: 64

    - Mixed Precision: bfloat16

    - UNET Learning Rate: 8e-07

    - LR Scheduler: Cosine with 10% warmup

    - Features: Min-SNR Gamma (corrected variant, gamma 5.0), Zero Terminal SNR, IP Noise Gamma (0.1), Residual Shifting, Conditional Dropout (prob 0.1)

    These changes result in better color fidelity and more reliable outputs across various prompts.

    - Note: All preview images where generated without any detailers or enhancers to show base capabilities


    Version 0.1 Overview

    The initial release (v0.1 alpha) was a proof-of-concept, trained for 10 epochs on a dataset of ~18,500 images (50% ZZZ characters up to version 2.0, 50% top-rated Danbooru images). It maintains traits from the base model (NoobAI-XL/NAI-XL V-Pred 1.0) while showing gains in stability due to the training approach.


    Project Goals

    - Provide a GUI-based training script to enable SDXL fine-tuning on consumer hardware.

    - Continue developing Aozora-XL into a stable, controllable model through ongoing training on diverse datasets.


    Training Method

    The method optimizes efficiency by training ~92% of the UNet. It includes adaptive Min-SNR gamma weighting for v-prediction stability and custom learning rate schedules.

    Training Specs:

    - Hardware: 1x NVIDIA RTX 3060 (12GB VRAM usage: ~11.8 GB)

    - Optimizer: Adafactor

    - Batch Size: 1 with 64 Gradient Accumulation Steps

    - UNet Params Trained: 2.3B


    - Positive Prompt: very awa, masterpiece, best quality

    - Negative Prompt: Optional; try (worst quality, low quality) if needed

    - Sampler: DPM++ 3M SDE GPU or Euler (Euler for line art, SDE for details like hands/feet)

    - Scheduler: SGM Uniform or Normal

    - Steps: 25-35

    - CFG Scale: 3-5 (works well at low values)

    - Resolution: 1024x1024 or similar (up to 1152x1152)

    - Hires. Fix: Use with upscalers like RealESRGEN at ~0.35 denoise

    Experiment with settings, as v-prediction models can vary by system.


    License

    This model follows the license of its base, NoobAI-XL. Review and comply with those terms.

    Description

    • This training cycle consisted of 5 epochs on a 50,000-image dataset, with the specific goal of reintroducing a richer color palette and optimizing compositions for widescreen formats.

    FAQ

    Comments (11)

    And233Aug 18, 2025· 1 reaction
    CivitAI

    Is the 0.15 still trained by your 3060? How long does it take this time?

    Hysocs
    Author
    Aug 18, 2025· 1 reaction

    This was more of a test to experiment with some v_prediction training parameters from newer research papers. The run took about 93.75 hours, which is 3 days, 21 hours, and 45 minutes.

    With my latest code updates, I've managed to reduce a lot of the instability. I also added data prefetching, which I think is a very underused feature. This has helped get my training steps down to the 1.30-1.35s range. VRAM usage is about 11.3GB with some unnecessary layers removed, or 12.5GB for the entire U-Net. I'm no longer training the text encoders in this script, as Adafactor was causing them to blow up. I'll have to revisit that part later.

    I decided to release this as 0.15 and not 0.2 because it doesn't dramatically improve things and may be worse for things aside from the colors.

    Most of my time is currently just on coding the script itself. so even tho its been 2 months alot of that time was coding and the rest was training and getting a dataset

    Q_7Aug 19, 2025

    Hysocs Is the script published anywhere?

    Hysocs
    Author
    Aug 19, 2025

    Q_7 its in the description but maybe you missed it, you can find it here
    https://github.com/Hysocs/Aozora_SDXL_Training

    illyaeaterAug 20, 2025
    CivitAI

    Thanks for the script. Time to train on my tamagotchi

    Hysocs
    Author
    Aug 21, 2025· 1 reaction

    i wish i could get the vram usage even lower, but training on anything more performant causes so much quality loss its not worth it, and to achieve this we are stuck with adafactor and for some reason it doesnt like this so it crashes sometimes. my script is in limbo rn and i left out some features so i need to add those back

    i didnt want to release the code to the public as its not a complete solution and evolves daily as i read more papers and test more things, but decided its worth it instead of the gate keeping and info about renting expensive gpus to train

    And233Aug 22, 2025
    CivitAI

    Does this scripts work well with character training? I wonder if you could add some this year's new anime / game character material, or use rouwei's training dataset in the next version.

    Hysocs
    Author
    Aug 22, 2025

    The script trains just as well as any other method without text encoders, as long as you can fit all the layers. If you're working with 12GB of VRAM, you can definitely train it on new characters by selecting specific layers to train, such as the input and middle blocks of the UNet. I'll do a quick base vs. fine-tune test on a character the model doesn't know and post it here in a few hours as proof of actual UNet learning. For generations without all this work, I recommend using a LoRA. Since the model was never merged or had any LoRAs secretly added like other models, you can use any LoRA with high accuracy.

    Hysocs
    Author
    Aug 23, 2025

    Here is a example of training
    https://ibb.co/pBJk0pTx

    And233Aug 23, 2025

    @Hysocs So, the models can learn new thing without text encoder training? or It just fills a gap in existing knowledge?

    Hysocs
    Author
    Aug 23, 2025

    @And233 The text encoder should be viewed as a separate, pre-trained module with a robust, built-in understanding of language. Further training of the text encoder is typically reserved for specific customization tasks such as fixing existing context bleeding or adding new words or tags not normally present

    I will add back text encoder training but for most fine tuning you are going for style and adding core character features so the text encoder does fine without being touched.

    Checkpoint
    NoobAI

    Details

    Downloads
    250
    Platform
    CivitAI
    Platform Status
    Available
    Created
    8/17/2025
    Updated
    6/11/2026
    Deleted
    -

    Files

    aozoraXLVpred_v015AlphaVpred.safetensors

    Available On (2 platforms)

    Same model published on other platforms. May have additional downloads or version variants.