CivArchive
    Belle Delphine - Pony - v1.0
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined

    This is a LoRA of the internet celebrity Belle Delphine for Pony Diffusion v6.

     

    Trigger word: “Belle Delphine”.

    Suggested LoRA weight: Depending on the style you want 0.4 – 1.0.

    The model is relatively flexible considering both prompting and using in other finetunes based on Pony Diffusion.

     

    Relevant additional tags might be:

    • “focused eyes” – Large open eyes

    • “braces” – Should be self explanatory

    • “snapchat” – If images were from snapchat, they were tagged, as these images have a lower quality and the text artifacts

    Should people be interested in some other tags (although their accuracy / reproduction varies), I might also share them. They mostly relate to clothing.

     

    Images were generated in SD.Next and should include metadata. I ran ADetailer for some of them, however it isn’t always a necessity (depending on the prompt and used model).

     

    Training

     

    As always, I will add a little bit about the training.

     

    I curated a new dataset for SD3, however as training tools aren’t up to date yet, I decided to give it a run on Pony Diffusion, as I already have a base SDXL and SD 1.5 model.

     

    The model you see here is not the first iteration. My initial trial was trained on the zonkey model, and it had impressive (accurate) results, however prompting was very brittle, for example if you left out “Belle Delphine” you would just get complete white noise. No idea how that happened. Also, it didn’t work at all with external LoRAs or on other models. (The training of that trial was with kohya, masked loss, DoRA).

     

    So, when SD3 released and it didn’t look that special, I decided to retrain the dataset, this time on the Pony Diffusion base model hoping for more flexibility. I also switched to OneTrainer (which I already used to generate the initial masks).

     

    The LoRA was trained on all images I could find (so quite many - in the low 5 digits). In addition to that, I created masks for the images in an automated fashion. I then tagged all images with wd-swinv2-tagger-v3 by SmilingWolf. For the SD3 dataset, I also labelled all images with several captions using a custom multimodal LLM workflow. (Sidenote: This took longer than training the LoRAs).

     

    Then once again using a custom workflow, I clustered the created images based on their contents and added tags to those clusters.

     

    I finally took all of this data, and trained it in OneTrainer with a random mixture of booru tags and natural language prompts.

    Relevant training parameters:

    • Prodigy optimizer

    • 18 epochs @ 4480 steps

    • No image repeats (after all I had sufficiently many)

    • Batch size 4

    • Using image masks, with unmasked probability of 0.03, unmasked weight 0.02 (which causes the occasional watermark bleed through)

    • 1024 resolution with aspect bucketing

    • LoRA rank 96, alpha 2 (later resized to target 64, with sv_fro 0.98)

     

    Training was done for about 48 hours on a RTX 4090.

     

    If you have any additional questions feel free to ask. However, I used tons of custom- and purpose-built code here, and this is probably nothing I will share. However, I can give you some pointers should you be interested in doing something similar.

     

    Disclaimer

     

    I want to highlight again that this model is non-commercial, and you should only post images on CivitAI which follow the Content Rules.

    Users are solely responsible for the content they generate using this LoRA. It is the user’s responsibility to ensure that their usage of this model adheres to all applicable local, state, national and international laws. I do not endorse any user-generated content and expressly disclaim any and all liability in connection with user generations.

    Description

    FAQ

    Comments (8)

    AIDivisionJun 16, 2024· 2 reactions
    CivitAI

    48 hours....damn. What GPU do you have?

    DiffusedIdentity
    Author
    Jun 16, 2024· 2 reactions

    This was done on a RTX 4090.

    Although the training time could also have been shorter assuming people are fine with using tools like ADetailer. However my observation was that the longer the training the better it is at smaller faces even without an inpaint step.

    4037108Jun 17, 2024

    @DiffusedIdentity did it take so long because of how many images you used?

    DiffusedIdentity
    Author
    Jun 17, 2024· 5 reactions

    @rime11 Not really. I mean it took longer of course to complete one epoch, however depending on what you want to achieve style wise one epoch would have already been sufficient. For example see the following link: https://files.catbox.moe/4u0coh.png - this is is a similar prompt like the preview image.

    This is the uploaded LoRA vs the LoRA at 1 epoch (non resized, so still at 700MB, time taken for training roughly 2.5 hours). Depending on what style you were going for, this would have already been completely sufficient. I also especially picked the base pony model to show the bigger differences. If you just used a realism model, there wouldn't be a huge difference (although still noticeable).

    But one thing you can clearly see in this image is the influence of "k1ttyg4m3r" - that is one of the clothing related tags I mentioned. After 1 epoch it wasn't even learnt yet ( -> it causes the pink razer kitty headset).

    However if people are interested in a more "stylized" version (so an earlier epoch) which has less influence overall, I would also have no problem with uploading one of them.

    futurafreeOct 2, 2024· 1 reaction
    CivitAI

    Absolutely nailed this one. Incredibly stable, flexible, and virtually identical likeness. I never once thought all those 3 qualities could be met to this extent. You should do a youtube tutorial on how to accomplish something of this sort. I have a question, would a 4080 16gb be capable of training this kind of lora, if time is no factor? I was active during sd1.5 release but took a break and became quite lost with regards to sdxl training.

    DiffusedIdentity
    Author
    Oct 10, 2024

    Sorry for the late response, I haven't been online lately. Training this on 16 GB VRAM would be no problem (it shouldn't even take that much longer than the 4090). Harder is getting a captioned dataset of a comparable size. Although you probably don't even need as many images as I used, as my images also included quite similar pictures (though still different). While a YouTube video would help with providing a beginning to end tutorial I already wrote down the key facts in the model description. Although I am open to any questions should you have additional ones.

    MoxiChanNov 2, 2024
    CivitAI

    Can you do MollyFlwers?

    fagotronMar 2, 2025
    CivitAI

    can you do Tia Dalma from pirates of the Carribean?

    LORA
    Pony

    Details

    Downloads
    4,762
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    6/16/2024
    Updated
    5/7/2026
    Deleted
    5/23/2025
    Trigger Words:
    Belle Delphine

    Available On (2 platforms)

    Same model published on other platforms. May have additional downloads or version variants.