CivArchive
    Dwayne Johnson aka The Rock FLUX Dev Fine-Tuning / DreamBooth Model for Educational and Research Purposes - Dwayne Johnson aka The Rock FLUX Dev LoRA Model for Educational and Research Purposes - Full Tutorial - FP16 Version
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined
    Preview undefined

    I am sharing how I trained this model with full details and even the dataset: please read entire post very carefully.

    This model is purely trained for educational and research purposes only for SFW and ethical image generation.

    The workflow and the config used in this tutorial can be used to train clothing, items, animals, pets, objects, styles, simply anything.

    The uploaded images have SwarmUI metadata and can be re-generated exactly. For generations FP16 model used but FP8 should yield almost same quality. Don't forget to have used yolo face masking model in prompts.

    How To Use

    Download model into diffusion_models of the SwarmUI. Then you need to use Clip-L and T5-XXL models as well. I recommend T5-XXL FP16 or Scaled FP8 version.

    A newest fully public tutorial here for how to use :

    I have trained both FLUX LoRA and Fine-Tuning / DreamBooth model.

    Activation token / trigger word : ohwx man

    Each training was up to 200 epochs and once every 10 epoch checkpoints saved and shared on below Hugging Face Repo : https://huggingface.co/MonsterMMORPG/Model_Training_Experiments_As_A_Baseline

    This model contains experimental results comparing Fine-Tuning / DreamBooth and LoRA training approaches.

    Additional Resources

    Environment Setup

    • Kohya GUI Version: 021c6f5ae3055320a56967284e759620c349aa56

    • Torch: 2.5.1

    • xFormers: 0.0.28.post3

    Dataset Information

    • Resolution: 1024x1024

    • Dataset Size: 28 images

    • Captions: "ohwx man" (nothing else)

    • Activation Token/Trigger Word: "ohwx man"

    Fine-Tuning / DreamBooth Experiment

    Configuration

    • Config File: 48GB_GPU_28200MB_6.4_second_it_Tier_1.json

    • Training: Up to 200 epochs with consistent config

    • Optimal Result: Epoch 170 (subjective assessment)

    Results

    LoRA Experiment

    Configuration

    • Config File: Rank_1_29500MB_8_85_Second_IT.json

    • Training: Up to 200 epochs

    • Optimal Result: Epoch 160 (subjective assessment)

    Results

    Comparison Results

    Key Observations

    • LoRA demonstrates excellent realism but shows more obvious overfitting when generating stylized images.

    • Fine-Tuning / DreamBooth is better than LoRA as expected.

    Model Naming Convention

    Fine-Tuning Models

    • Dwayne_Johnson_FLUX_Fine_Tuning-000010.safetensors

      • 10 epochs

      • 280 steps (28 images × 10 epochs)

      • Batch size: 1

      • Resolution: 1024x1024

    • Dwayne_Johnson_FLUX_Fine_Tuning-000020.safetensors

      • 20 epochs

      • 560 steps (28 images × 20 epochs)

      • Batch size: 1

      • Resolution: 1024x1024

    LoRA Models

    • Dwayne_Johnson_FLUX_LoRA-000010.safetensors

      • 10 epochs

      • 280 steps (28 images × 10 epochs)

      • Batch size: 1

      • Resolution: 1024x1024

    • Dwayne_Johnson_FLUX_LoRA-000020.safetensors

      • 20 epochs

      • 560 steps (28 images × 20 epochs)

      • Batch size: 1

      • Resolution: 1024x1024

    Description

    For Full Details, Training Dataset, Tutorial, Guide, Configs, Training Json Files, Workflows, Installers, Resources and All Checkpoints > https://huggingface.co/MonsterMMORPG/Model_Training_Experiments_As_A_Baseline

    FAQ

    Comments (26)

    9ballNov 2, 2024· 4 reactions
    CivitAI

    22GB for The Rock? Yea..

    SECourses
    Author
    Nov 2, 2024

    FP8 version also exists but sadly I couldn't find how to make it default asking CivitAI team

    @SECourses Most creators would extract a lora from the dreambooth model and upload this instead.

    SECourses
    Author
    Nov 3, 2024· 1 reaction

    @Triple_Headed_Monkey I will post that too. I trained LoRA models as well and i will hopefully publish all. LoRA , and LoRA extraction

    SECourses
    Author
    Nov 2, 2024· 1 reaction
    CivitAI

    FP8 model is also there sadly it is not set as default and I am asking CivitAI team to how to set it default

    Triple_Headed_MonkeyNov 3, 2024· 1 reaction

    Yeah, no. Apparently the site still doesn't allow for you to choose the order of files uploaded on a single model page without uploading as an entirely new version.

    SECourses
    Author
    Nov 3, 2024

    @Triple_Headed_Monkey yes sadly that way. i added as separate models for now

    eurotakuNov 3, 2024

    @Triple_Headed_Monkey @SECourses working as intended, all community members can choose their favourite precision in their account settings, so someone preferring fp16 will see that as default, another one with fp8 set as default, will be shown that instead as the first one. so you can put both checlpoints into one version, each visitor will see what they prefer. :)

    SECourses
    Author
    Nov 4, 2024

    @eurotaku but it is set as fp16 by default and private window shows that too. i think user should be able to override default behavior

    @eurotaku Yes it is totally working as intended when you choose to upload a CLIP model and it shows up as the default because it is the higher precision model :D And not to mention when you try and change the precision there is no value lower than FP8 and setting it the same it will just tell you "there is already a model of this type uploaded"

    HavoFXNov 3, 2024· 2 reactions
    CivitAI

    Keep up the good work!

    SECourses
    Author
    Nov 3, 2024

    Thanks a lot for comment

    sevenof9247Nov 3, 2024· 1 reaction
    CivitAI

    question lora training, have you found out whether it is better to remove the background or describe it, e.g. for portraits?

    SECourses
    Author
    Nov 3, 2024

    well i tested full captions. it reduces training accuracy. however for only background, i didnt test them to be fair

    ranjeet3939Nov 3, 2024· 1 reaction
    CivitAI

    I am getting error: [ComfyUI-0/STDERR] ValueError: Model face_yolov9c.pt not found, or yolov8 folder path not defined

    ranjeet3939Nov 4, 2024· 1 reaction

    @SECourses thanks, where to put this file in swarm ui?

    SECourses
    Author
    Nov 4, 2024· 1 reaction

    @ranjeet3939 make a folder inside models folder as yolov8 put there

    ranjeet3939Nov 5, 2024

    @SECourses thanks champion, one more thing, in the article, could you please write the setting which needs to be selected during image generation in swarmui, for example: sampler, clips, flux guidance etc... I am trying to replicate your images and huge fan of you :)

    SECourses
    Author
    Nov 5, 2024

    @ranjeet3939 please watch this tutorial just recently recorded to show all : https://youtu.be/-zOKhoO9a5s

    SECourses
    Author
    Nov 5, 2024
    CivitAI

    For Full Details, Training Dataset, Tutorial, Guide, Configs, Training Json Files, Workflows, Installers, Resources > https://huggingface.co/MonsterMMORPG/Model_Training_Experiments_As_A_Baseline

    Triple_Headed_MonkeyNov 5, 2024· 2 reactions
    CivitAI

    I'm going to be kind here. Flux training is case sensitive, so while it is cool that you managed to show that it is easier to train Flux contradictory to how it expects to be trained compared to other models, if you were to repeat this experiment you should at least use the tags/captions like so:

    Owhx Man

    This should reduce the amount of time it takes and the rank/dim needed to achieve decent results.

    SECourses
    Author
    Nov 5, 2024

    it is true case sensitive. but we still don't have full tokenizer. have you found any? i found T5 tokenizer and it had so few words

    @SECourses I've not seen a decent one around off the top of my head either.

    T5 is basically a mini LLM. I'm fairly certain that CLIP is still handling the majority of the heavy lifting all round including the tokenization process. I've been trying to work it out for a little while but I think the process is something like:

    Text input > T5 breaks it down contextually based on the sentence structure and attempts to feed it to the CLIP tokenizer without giving it room to mistake intent > CLIP converts tokens into a vector with spaital/visual information > Transformer uses these vectors to generate an image.

    There are a couple of other possible configurations, but the simple take away from all of the different variations was that T5 is not capable of interpreting or producing the output necessary to caption or generate imagery. It is inherently fully text and token based. Therefore it's contributions to the process must also be restricted to the domain of text and/or improving the tokenization process with semantic context.

    By itself T5 is basically an autofill model. Which would generate text based on simple inputs/parameters. Things like finishing a sentence you've started writing or responding to simple questions relating to additional information being provided to it. For example when interrogating an image using a Vision adapter model or something like BLIP.

    In other words I'm not sure it really matters much in this case. CLIP having been created originally for the purpose of captioning and sorting images into categories was accidentally found to have the ability, when used in reverse, to generate the image information it was trained to caption.

    So instead of converting images to vectors and plotting them to category tags, it became possible to input category tags and it would output vectors.

    The transformer is then trained on top of the vector inputs with the same, or similar enough, datasets that were used to train the CLIP models, which bridges the gap between the Transformers ability to generate/manipulate pixels and the CLIP's output of vector information.

    In the case of FLUX in specifics, the architecture seems to include a secondary clip model inside the transformer model layers itself. Which handles the translation between the training done on the transformer and the untrained clip and T5 models and allows for results to be closer to if you had trained them in conjunction.

    This is my take on it anyway.

    wikeeyangNov 14, 2024· 1 reaction
    CivitAI

    A very good teacher! I learned a lot from you, especially about the De-distill model, thank you a lot.

    SECourses
    Author
    Nov 17, 2024

    @wikeeyang thanks a lot

    Checkpoint
    Flux.1 D

    Details

    Downloads
    39
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    11/2/2024
    Updated
    5/7/2026
    Deleted
    5/23/2025
    Trigger Words:
    ohwx man