Articles are now live for everyone! No need to download guide, live > Here <
Guide will still be updated here so you can get it in image format.
Beginner's guide!
I'm a beginner myself and don't see myself teaching. This is just to learn how to make Embeddings like I do them.
Once you've learned the basics, you can continue to learn more from other places.
Have chosen to make the guide in pictures and text as it is easier to just copy what I do.
tldr: You should be able to skip all the text and just look at the pictures.
Just a reminder!
You should never download without checking for viruses! Even if it is said to be safe.
I can recommend using: https://www.virustotal.com/gui/home/upload
The images are in a .rar file. Use the program below if you can't open it.
Use as you wish. No asking permission needed share or do what ever.
Long live pirates! F them greedy.. Have fun! :D
Description
Beginner's guide! v1
FAQ
Comments (41)
After preparing datasets(imgs+txts), and all training tab settings, then clicking "train embedding" button, it stopped in seconds and showed "RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
Applying xformers cross attention optimization."
How can I continue?
Google it and looks like a bug. Did you use: Use deepbooru for caption ? If so its my bad have never try just added it because i heard them from someone else. Stick to: Use BLIP for caption.
@Alyila found reason, oom CUDA, 4gb Vram not enough for training I guess.
@dajusha oh.. No 4gb is way to low! Sorry.
It says the zip file is corrupt (i.e. defective).
Good and clear tutorial! It's similar to the method I have used for a few TI's. I do put batch size at 3 and gradient at 2 for training. Also, I notice you do not really adjust the description in the txt files after BLIP captioning? Sometimes there's weird shit in there and I think it also helps to separate some things with comma's and add more info about the background (e.g. blurry or black background) depending on the picture.
Thanks for stopping by! :)
As a beginner, I change the text and blur the background of the images. I didn't get quite the same results then.
And this is a beginner's guide and I don't feel it's important when you are new. If you change too much or do something wrong there, it often gets worse.
If you just take your time and find good pictures from the beginning, it shouldn't be a major problem.
Great guide!
Seems like a little mistake in either the image or the text part. For step 9 on the train section, the image shows the "Drop out tags when creating prompts." as 0.1, but in text you mention to put it as 0.9 so not sure which is correct.
Should the Prompt Template on the training page be subject_filewords.txt instead of style_filewords.txt if you're doing a person?
Update after reading more:
select style_filewords.txt if you are training an artistic style
or
select subject_filewords.txt if you are training an object, person or animal.
It works well either way, but the right way is to choose subject if you are going to make a person. Will add it to the guide. Thanks!
@Alyila Yeah I wonder how much of a difference it makes... could even try working out a custom one. I think it just a file filled with some prompts so could even do a mix of both.
@NebulaT13 you find the files here: stable-diffusion-webui\textual_inversion_templates
So you can just open them and see the difference. Just made a new model using subject and not sure i can se any changes atm.
Mix doesn't sound good if you check the files, it's probably good to choose a focus on what you want to use it for. I´am sure there is custom one you can use to.
I have 6gb VRAM can I try? or its risky?
Feel free to try, dont think you can sadly. But you can try A1111 on colab, You have to pay for it then. =/
Why using style_filewords.txt in your settings instead of subject_filewords.txt which is more effective for training a subject?
I'm a beginner myself, already talk about this with another here. And it will be changes in next update.
Here are the correct ones:
select style_filewords.txt if you are training an artistic style
or
select subject_filewords.txt if you are training an object, person or animal.
Is there a downside of using textual inversion/embedding instead of LoRA? This guide makes it look like textual inversions are much easier to make, and the file size is small too.
You have to decide that yourself. It depends entirely on how they are trained, I have seen Loras who are both worse and better. I have made all my models as in the guide and it is super easy for anyone to do. Therefore, I would recommend it for beginners. But on paper, Loras should be better. Give it a try! :) please show your result later if you do something.
Good luck!
@Alyila Can you elaborate why on paper LoRAs should be better? Can it hold more information or something (because of the filesize difference)?
@FaeFlan LoRAs (Language Representation Models with Adaptive Inference Time) are language models that dynamically adjust their inference time based on the complexity of the input. They aim to balance computational efficiency and model accuracy by allocating fewer resources for simpler texts and more resources for complex texts. LoRAs are efficient, optimize resource usage, and offer a trade-off between efficiency and accuracy.
Textual inversion, on the other hand, is a text transformation technique that involves reversing the order of characters or words in a given text. It can be used for data augmentation or generating creative variations of text. Textual inversion is useful for tasks that require diverse text generation or exploring unique linguistic patterns.
Both approaches have their strengths depending on the specific use case and requirements. LoRAs excel in computational efficiency and resource optimization, while textual inversion is useful for generating diverse text variations and exploring creative transformations. Choosing the "best" approach depends on the specific needs of your task.
@Alyila I did a quick test earlier but on a cartoon character and on Novel AI as base. It didn't turn out very well but I'll try again maybe for a real(istic) person. I think the tagger just didn't recognize the subject very well. I get that stable diffusion 1.5 is a good base for real people, but for cartoon and anime characters, would you recommend to use stable diffusion as well or is it also ok to use Novel AI as base? (For LoRAs I think most people train on Novel AI if they are doing anime, I would assume it's the same for textual inversion)
@FaeFlan Thanks for update! Atm i cant help you much more, I have never try cartoon/anime. Will do that for my next training so i can help you and other here that want to learn that to. If there is a big difference between them, I will make a separate guide for cartoon/anime style.
@FaeFlan Check my result which I posted here. Used the same method as in the guide just change to style_fileword.txt It won't be quite right but can still get ok results.
The biggest difference is which checkpoint you use. But you can also try training with a different base and see what works best for the particular style you are looking for.
@Alyila I just saw it, it looks pretty good. Thanks for trying!
@Alyila I tried out your guide again with a real person. It was pretty good but I am curious about three things now 1) Why did you pick 1500 steps as optimal instead of more? usually I train my LoRA at 6000 steps but the best results come at 3000 steps or so. 2) I used 72 images instead of 30 as you suggested in your guide, should I change the number of steps or keep it at 1500? 3) The model I picked actually has pretty chubby cheeks but the image I generated gets pretty angled faces. Should I train for more steps or change something in the prompt to get the chubby cheeks back?
@FaeFlan
1: This is a beginners guide feel free to test what works best for you and your project. 1500 works fine most of the times and if you do to much you can over train it and get bad results to. Saves time to ;)
2: Getting more is not bad. If you get good pictures you don't need that many and its harder to find more photos. Can test your way here to, the number is not set in stone just +15.
3: Your model is almost always trained good just bad checkpoint for your model. Its all on what checkpoint you use. ChilloutMix will almost always have have more sharp face. Try analogMadness v40, or any other. Here is a model i made that are a bit more chubby in the face: https://civitai.com/models/74739/awondrr-sg
I try some like chubby, bbw etc. So you can use prompt to but you will not get correct size as the model. Think it have to do with the checkpoint to and what models they use for training.
@Alyila Aah, I see. I will try with analogMadness 4.0 later as well. Another thing that popped to my mind, the automatic cropper crops to the focal point (usually the face) but what if you have a certain feature you want to include e.g. some specific tattoo on left leg. Should I then manually crop and resize everything so that the dataset includes both the face and the leg tattoo?
@FaeFlan
You can be lazy and only use, automatic cropper crops to the focal point. I do always edit my photos to get what i want in frame. It training on 512x512 so just crop it to a square that contains what you want in it.
Ai is not good at tattoos from my experience. So focus on face is better or a mix of both.
How long does it take to train your TI?
Btw love your SG TIs.
Thanks! :)
Well depends on what card you use: ~35min for a RTX 3060 12gb
Takes more time to get all pictures and edit them. And for me how upload i need to render a lot of pictures for display that takes time to. So total time is around 2h for me.
Why my trained TI does not work in other models other than v1-5-pruned-emaonly?
With my trained TI in v1-5-pruned-emaonly, I managed to get somewhat legit results, but with other models, like Deliberate or Dreamshaper, it looks like complete different character.
You only train your Ti using v1-5-pruned-emaonly, when you render your model try different checkpoints to see what works best for your model.
@Alyila Thanks for your reply, but if I use models other than v1-5-pruned-emaonly during training, it give random weird results. I try a couple of models they are all like this.
@Mitsunari
I do not recommend you to changes base for training.
You can try 2.1 https://huggingface.co/stabilityai/stable-diffusion-2-1/tree/main
I have not try it for my self . From what i read its more for architecture, interior design and other landscape scenes. If you want people 1.5 is better.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.




