512x768 vertical 2:3
<lora:crsxFT15_p1-step00002000:0.9> {crsx, }{realistic, }{bound, }{bdsm, }{bondage, }{1girl, |}{navel, |}{solo, ||}{barefoot, ||||||||||}{rope, |||||||||||}{suspension, |||||||||||}{torture, |||||||||||}{pussy, ||||||||||||}{photo background, |||||||||||||}{nature, ||||||||||||||}{forest, ||||||||||||||||}{tree, |||||||||||||||||}{cross, |}{hanging, ||||||||||||||||||}{completely nude, ||||||||||||||||||||}{restrained, ||||||||||||||||||||}{female pubic hair, |||||||||||||||||||||||}{bound wrists, |||||||||||||||||||||||||}{pubic hair, ||||||||||||||||||||||||||}{jungle, ||||||||||||||||||||||||||}{nudist, ||||||||||||||||||||||||||||}{bound ankles, ||||||||||||||||||||||||||||||||}
Description
juggernautXL_juggXIByRundiffusion.safetensors [33e58e8668]
Vertical 832x1216 0.8 str
<lora:crsxXLjx_p1-step00000200:0.8> {crsx, } cross, {1girl, }{realistic, }{bound, }{bdsm, }{solo, }{bondage, }{navel, ||}{rope, ||||||||||||||}{torture, ||||||||||||||||}{suspension, ||||||||||||||||}{pussy, ||||||||||||||||}{hanging, |||||||||||||||||||||||||}{completely nude, ||||||||||||||||||||||||||}{restrained, |||||||||||||||||||||||||||}{female pubic hair, |||||||||||||||||||||||||||||||}{bound wrists, |||||||||||||||||||||||||||||||||}{pubic hair, |||||||||||||||||||||||||||||||||||}{jungle, |||||||||||||||||||||||||||||||||||}{nudist, |||||||||||||||||||||||||||||||||||||||}{bound ankles, ||||||||||||||||||||||||||||||||||||||||||}
The Lora is huge, you can probably reduce it.
FAQ
Comments (11)
Are all the pipe characters " | " a necessary part of each trigger? If so, may I ask why?
Also, just curious, but why such large file sizes? I've created individual SD1.5 LoRAs containing 8+ concepts, with each concept trained on 1500+ images...all in a single Lora.
I've never needed more than 128 rank and 64 alpha to contain it all (and 128 was probably overkill) and have excellent accuracy & consistency. I've never come anywhere close to 1.47GB with that rank & alpha. 144MB tops. Weighting data for relatively simple concepts do not require a high rank and file size, regardless of the number of images it's trained on. In fact, the larger the rank (and file size), you are much more likely to introduce bias and unintended concepts into the LoRA. This is especially true if the TE is being trained along with the UNet.
Anything beyond 256 rank suffers from severe diminishing returns and is basically a waste of valuable SSD space anyways. Once you reach that threshold, a finetune becomes a much better option for a multitude of reasons.
I'm not criticizing at all. Just trying to understand, and possibly impart knowledge.
I think the users can reduce the size of the loras...? or they need some source files?
>excellent accuracy & consistency
this isn't really scientific.
>Anything beyond 256 rank suffers from severe diminishing returns
source?
>Are all the pipe characters " | " a necessary part of each trigger? If so, may I ask why?
https://civitai.com/articles/7297/sarah-peterson-model-generation-guide-faq
@sarahpeterson Scientific? That statement doesn't exactly make sense.
Consistency - Each concept in a single LoRA produces the expected results with 95% or higher accuracy.
Accuracy - Near 100% likeness of the intended subject or concept.
Nothing "scientific" is required to measure or quantify either of these. Consistency & accuracy aren't relative or subjective, they're common sense observations that can be made by anyone with a single glance at the training data and then the generated results. ie. If I train a LoRA to produce dogs and the LoRA generates pictures of dogs 950 times out of 1000, no scientific approach is needed to tell that: A:) accuracy - it's dogs in the generated images (unless you've never seen a dog before lol) and B:) consistency - 95% consistency has been achieved. 950 out of 1000 = 95%, which is an acceptable margin of error given the nature of Stable Diffusion.
Regarding >256 rank being a waste of disk space, suffering from diminishing returns, and risks introducing unwanted/unintended concepts, these are well known facts. The source is the original LoRA whitepapers written by the original creators of the LoRA format; Edward Hu, Phillip Wallis, Yelong Shen, et al. It has been noted by cloneofsimo, who made LoRAs compatible with Stable Diffusion in the first place. And it has been readily observed, identified, and documented in multiple other scientific generative arts studies seeking to enhance the abilities of the LoRA format (such as those written by the creators of DoRA, LoHA, Lycoris, etc.). The whitepapers are completely open source and freely available. Google is your friend.
I read your FAQ, and like the idea, but it doesn't clearly indicate whether or not to include the pipes in the prompts.
@sarahpeterson As for reducing file size, the training data wouldn't be needed. LoRAs don't store the images themselves, they only store a series of floating point numbers that symbolize the weights of the different aspects of the concept(s) it was trained to understand. So the training images themselves wouldn't be helpful for resizing a LoRA. Coincidently, that second sentence is the exact reason that ranks higher than 128 are unnecessary for single concepts, regardless of the number of training images used.
There are tools out there that can prune LoRAs. The better tools/scripts allow you to control and fine tune the output, but a solid understanding of Up/Down/Mid weights, the proper use of rank and alpha, and save precision types are required to use them without destroying the LoRA in question. Not exactly for the novice user.
A better (and much simpler) approach would be to identify a finetune that produces accurate results with this LoRA (preferably the model it was trained on), merge the LoRA with the finetune, and then extract the difference between the two (original finetune vs. merged finetune) as a LoRA. When doing so, you can specify the new rank for the extracted LoRA. You can then play with extracted LoRAs of different ranks to find the smallest one that still provide accurate results.
I'd bet you'd be surprised at how small you could get this LoRA. A concept like this would likely require, at most, a rank of 64 (which produces a 72-82MB safetensor). A rank of 128 would possibly be needed for some of your more complex concepts, especially those involving more than one person, but even 128 is likely overkill in most cases.
@Bit_Shifter so if you want to save 400mb, merge the lora into the model then extract a smaller one.
@Bit_Shifter not so scientific there. I'm not aware of the larger size reducing quality in studies on stable diffusion lora. That paper is about LLM lora...?
>noted by cloneofsimo,
citation? Where are the experimental results?
Regarding the pipes re read the generation guide. Notice they don't appear at all in the sample images as they are dynamic prompt code.
@sarahpeterson
> not so scientific there.
As I mentioned, there's no reason it should be. Our brains begin developing basic analogical reasoning during or before preschool. If one needs scientific research to be able to compare two images (dataset vs. generations) to tell that they look alike, then they need another type of help altogether. Perhaps an optometrist or mental health professional?
> I'm not aware of the larger size reducing quality in studies on stable diffusion lora.
I didn't say it would reduce quality, though that's technically possible depending on training parameters and training data. The problem is more likely to be apparent in regards to unintended concepts or bias (more on that below).
To understand why extremely large Network DIM Ranks (and file sizes) are unnecessary (and possibly destructive) during training, I'll break Network DIM Ranks down as a whole as briefly as I can...
Your DIM Rank ultimately controls the LoRA's file size and how much information can fit within it. The Alpha works alongside DIM Rank but simply applies a set of brakes to the training by acting as a divisor for weight data, so alpha is irrelevant here for the sake of brevity and staying on-topic.
Next, for anything below this paragraph to make sense, you must first understand what data is stored in a base model/finetune. Every single concept in a model/finetune has a token assigned to it, a value that acts like an identifier. It also has default weights associated with each token. These were defined during its training/creation. When generating images using just prompts (no LoRAs), each word of your prompts are converted to the numerical tokens/identifiers as dictated by CLIP & the model. Some words (compound words, words with multiple meanings, or single words that convey more than one idea/concept) might consist of more than one token, but in any case, the tokens are then ultimately passed through CLIP and on to the Unet along with the model's stored weight data for each token. These weights are multiplied if you manually specify the weight for a word/concept, ie. prompting "a photo of a (large:1.2) green fish". Either way, CLIP & the Unet do most of the the rest. Note that the Tokenizer add-on for A1111 that will let you actually see the tokens stored in a selected model/finetune (via CLIP) for any prompt you enter. Very handy for refining your prompts, captions, comparing models, and/or identifying a model's strengths/limitations (it's inherent understanding, or lack of, of specific concepts).
Next, as I previously mentioned, LoRAs are mostly just a set of floating point numbers that act as weights for the individual concepts it is being trained to learn. However, I didn't mention that tokens are also stored. If I train a LoRA to produce "green fish" and the model it's trained on already has the token "11248" assigned to fish and "1246" assigned to the color green, then the weight data for green and fish in the LoRA will also be assigned these tokens. This is how & why LoRAs are able to still produce intended concepts even when using different models & finetunes, as most all of them were trained on their architecture's (SD1.5/SDXL/Flux) base models (however, too many merges can positively/negatively affect this). When invoked, a LoRA simply implants, complements, intensifies, lessens, negates, or completely supercedes concepts already within (or absent from) the model/finetune being used during image generation. It does so using these tokens and weight data. That is exactly why LoRA was initially conceived...to quickly introduce new concepts without having to constantly train/re-train full models, while keeping a low file size. Unnecessarily large LoRAs completely defeat these purposes.
Speaking of which, now on to the topic of file size...tokens are simply integers, while weight data consists of floating point numbers. Both take up very little disk space...even millions of them. At 32-bit precision, a single floating point number is only 4 bytes. Ten million floating point numbers equals a paltry 38.15MB (DIM Rank 32). But that's ten million...ten million points of weight data to achieve a simple concept in only 38MB. Have you not wondered why so many can successfully train their concepts into a LoRA and keep the file size so low? This is why...but allow me to elaborate further...
Lets use your LoRA as an example. If the model the LoRA is trained on already recognizes some of the training data's basic individual concepts; woman, cross, rope, arms outstretched, etc., which most all do, not a lot of data needs to be stored in the LoRA for those basic concepts. The training model already knows these common individual concepts.
In this case, the concepts are basic so the data in the LoRA only needs to contain tokens for each and apply minimal weights to what it already knows (woman, cross, arms outstretched) to make these concepts prevalent when triggered. But, you want it to understand the idea you are trying to achieve by teaching it the spatial relationships you want between these individual concepts. In this case, that's where the "magic" happens by creating new weights during training for spatial data. ie. the cross is always behind the woman, the woman is always in front of the cross, the woman's arms are always outstretched, rope is applied to the wrists, etc. The additional data required to achieve these relationships is still very small in most cases. For this LoRA, I estimate 38MB (32 DIM) is all that's needed (maybe 72MB (64 DIM) on the high end).
That's all it needs. That's it...and at 38MB (a DIM Rank of 32), it has ten million points of data to store the weight data to make it happen. That's much more than enough. Case in point, I trained a LoRA for a client who owns a dance studio. It consisted of 32 different gymnastic poses, trained on 7,211 handpicked, curated images, each pose was it's own concept/trigger, and the file's size was only 72MB (32 Rank). Another case in point, my JenyaD LoRA here on CivitAI, was trained on 12 different concepts (I trained each "attribute" of her body individually for accuracy), using 4,201 images for 18,222 steps across 6 epochs, and the file size is only 144MB. It should have been 64 DIM Rank at 72MB, instead of 128 DIM at 144MB, but I forgot to change the DIM Rank before training. Regardless, the results would have been the same had I remembered. 144MB was overkill. The case is the same all across CivitAI: concepts that are exponentially more complex than "crucifixion" being trained, but the file sizes are 50-92% smaller than yours in an exceptional majority of cases.
Just because you're using thousands of training images, it doesn't mean the DIM Rank and file size needs to be extremely high (>128 DIM Rank/144MB+). Once the training converges and the Unet "learns" any single aspect of a concept, it will assign the weight to it...a single (in most cases) tiny 4 byte floating point number. Just because the training data contains 1,000 images, data for a single concept isn't stored 1,000 times...just once, though weight data numbers will individually change MANY times throughout a training session. Nearly every other setting in the various LoRA training tools, asides for DIM Rank, are there to help convergence happen quickly & accurately while applying the proper weight data (assuming all your training parameters are correct, proper captions, relevant training data, etc.). The DIM Rank is there simply to specify the size of the workspace SD can use for storing and applying the data. After a point, more workspace isn't needed and can have adverse consequences, which brings me to my last point...
Lastly, regarding oversized Ranks/file sizes. Taking into account the above regarding 10 million FP numbers equalling 38.15MB, consider the following. Your 513MB Crucifixion LoRA contains enough room for just over 134.47 million floating point numbers...while the actual necessary data in your LoRA is likely only between 7-13% of that amount. What do you think SD does with the rest of all that wasted space during training? Ideally, it would zero the remaining unneeded space out (the diminishing returns I spoke of). But that's not always the case. It will try to use any additional space within the LoRA if it thinks it's warranted. A single misworded caption (unintended concept and/or bias), one too many images containing brown wooden crosses that also happen have women with brown hair (bias), if the dangling rope in an image is shaped too much like a hand (unintended concept), moles and freckles being misinterpreted, resulting in excessive skin blemishes (unintended concept), and the list goes on and on.
In short, there is always the chance that SD will introduce its own biases/concepts into a LoRA based on what it thinks it's supposed to learn if it's given the chance...and an unnecessarily high DIM Rank gives it that chance and the extra space to do it!
> That paper is about LLM lora...?
I don't know what specific paper you're referring to. There are hundreds on the topic of LoRAs (both LDM and LLM), the theory, math, architecture, etc. The term "whitepaper" is an all-encompassing plural, not a literal singular paper. I'd start with ArXiv. HuggingFace is another possibility, but beware. Much of the information there is non-authoritative. Some of the info there is accurate, some is not, some is simply outdated.
> citation? Where are the experimental results?
Again, Google is your friend. I've absorbed all I ever care to know about LLMs, LDMs, SD, LoRAs, and generative AI in general in three years of practice and professional study (not YouTube and armchair scholars). I've summed up the topic of Network DIM Ranks for you as simply as I possibly can, broken it down, and handed it to you on a silver platter. If you're still skeptical after that, you're welcome to waste your own time researching (only to find those answers to be the same that I've already provided you). While I over-simplified & omitted a bit for brevity above (not mentioning relevant info regarding the text encoder, latent space, attention layers, block weights, etc.), you will find that what I wrote above is still accurate.
@sarahpeterson
> so if you want to save 400mb, merge the lora into the model then extract a smaller one.
I just did. I extracted it and it saved 440MB. At 73MB (DIM Rank 64), it still works fine, and it actually gave me a bit more prompting control over body shape specifics.
Good luck to you.
Edit: Negative prompts seem to be more responsive and, at a CFG of around 5, it plays nice with other LoRAs now as well. Before, the results looked overcooked when used with those same LoRAs at CFG 5.
@Bit_Shifter I just stumbled on another one of your great posts here and wanted to say THANK YOU!!! for this. I've seen a llot of ppl try to explain this in layman terms but your post is teh first to make any sense at all. Its a shame the creator is too hard headed to realizing theyre wasting their and everyone elses disk space. you should be writing books or creating youtube vids for sure. All the work in ur post seems wasted on the biased and know it alls. but wait! that's not scientific!!!!!
@infinitytech Thank you for the kind words. I try to help when I can. I've considered creating a self-hosted blog, as I've worked in IT and have been a web developer since the internet was in its infancy (yes, I'm THAT old), but it would be difficult for me to find the time to maintain it. I may give it a bit more consideration though.
Doesn't work for me, and I found that if 'crucified' is missing you don't even see the cross





