Please read SD3 Unbanned: Community Decision on Its Future at Civitai
Stable Diffusion 3 (SD3) 2B "Medium" model weights!
Please note; there are many files associated with SD3. They will all appear on this model card as the uploads are completed.
Check out our pre-release SD3 Overview for some information about the model, and the SD3 Quickstart Guide to help you get started generating!

There are three .safetensors versions;
sd3_medium.safetensors
sd3_medium_inc_clips.safetensors
sd3_medium_inc_clips_t5xxlfp8.safetensors
Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
For more technical details, please refer to the Research paper.
Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai for commercial licensing details.
Model Description
Developed by: Stability AI
Model type: MMDiT text-to-image generative model
Model Description: This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl)
License
Non-commercial Use: Stable Diffusion 3 Medium is released under the Stability AI Non-Commercial Research Community License. The model is free to use for non-commercial purposes such as academic research.
Commercial Use: This model is not available for commercial use without a separate commercial license from Stability. We encourage professional artists, designers, and creators to use our Creator License. Please visit https://stability.ai/license to learn more.
The original Huggingface Repository for SD3 can be found here.
Description
The 2B Parameter "Medium" weights. Includes the T5XXL Text Encoder. Read the Quickstart Guide for more information!
FAQ
Comments (172)
Showing latest 100 of 172.
its funny watching watching everyone shit on this model as if base 1.5 and sdxl werent ass either
compared to base sdxl (which was especially ass because it depended on refiner) it is miles better.
also because it doesnt depend on a refiner the community can fix it much faster.
I wonder if it works on low end machines or at least medium ones because i don't have a 3080 much less a nvidea gpu, heck, i can't even run zlude with my amd
Horse shit
Imagine filtering your base model's image set to omit your target audience.
Does it work with ControlNET already? ComfyUI
I'm personally a lot more excited for LoRAs than finetunes, since finetunes generally seem to reduce creative freedom - and if it's anything like SDXL, its ability to do text will vastly diminish with even the very first finetunes.
My biggest complaint with the model so far is that it only works with very few samplers.
I've only got it to work with the following:
- euler (but not ancestral)
- heun
- heunpp2
- lms (kind of)
- lcm (probably has the most visually distinct results)
- dpm++ (but not SDE, ancestral, or Karras)
- dpm_fast (kind of)
- dpm_adaptive (kind of, and it's slow as fuck)
- ddim
- uni_pc_bh2
From my preliminary testing, it seems like none of these really do as good a job as euler.
Also, any scheduler other than sgm_uniform is a no-go... not that I've tested every sampler-scheduler combo.
What this means practically is that the ol' reliable "DPM++ 3M SDE Karras/Exponential" just doesn't work with this model, which is a bummer.
Extremely disappointed with this model.
Every since the announcement a few months back, I've been extremely pumped to try out SD3. The released pictures looked amazing, the new techniques the discussed sounded fantastic - my hopes were through the roof.
Now that I've got a chance to try it, the censorship in this model really holds it back. It's barely usable compared to other modern fine tunes.
Hoping that fine-tuning will eventually make this model into something usable.
Aaaah come on folk. SDXL was also shit before people put a lot of effort in it. Give it some time.
We waited so long for SD3 as we were assured it would be better than Midjourney and we were cheated! It's awful, distant body parts, bad images and wrong physiology, poor quality of picture generation, no detail, small details if there are any are shapeless, boring! No new look, angle, framing! Draws terribly - level SD 1.0. Where is the new native resolution 1536x1536? Why is it needed, if the models of civitai enthusiasts (maybe they will save the situation) - 100 times better than this smudge? Evaluation - BAD! I don't see the development of the model - it's 3 steps backwards!
I couldn't get the workflow to work, it looked like it did but produced no file.
Was able to use the workflow on https://comfyanonymous.github.io/ComfyUI_examples/sd3/
its interesting how the file size is lower than most SDXL 1.0 fp16 models. I was expecting a bigger filesize due to improvement on SD 3.
Throws results that are all over the map... prompt adherence seems really good but it appears lost when trying to go for style or image composition.
Highly disappointed. Can't even do hand and basic anatomy correctly. I'll go back to SDXL.
SD3 is a foundation model - its a generic basic foundation. YOUR job is to fine tune it. Stop complaining and get to work.
Not available for the on-site generator? Just local only? Massive L.
So...when can we try it on here
Disappointing. Can't say there is any improvement in limbs and fingers count, characters often comes out deformed and even more often they appear like poorly photoshoped into the background.
I'm looking forward to this. Just from what I've seen so far it's like starting out at where much of the trained SDXL models are right now. So, we can only go up from here. Stop complaining and start creating!
Does SD3 allow for making NSFW and porn? Are there any restrictions?
After so long, the SD3 model still cannot generate poses like piggyback.
I like the colors
Just curious, why do things work with comfy more often than A1111? Is there something about comfy that things work right out of the box, or are their devs just really quick?
SD Next has SD3 compatibility BTW.
The medium file with encoders is 10+gb in size. Does this mean it will not work with an 8gb GPU?
+Create High Quality images.
+Use about 1GB less V-Ram than SDXL.
+Can do simple text.
+Understands the prompt.
-/+Can do nudity but very limited, well none of the base models can, that's where the community comes into play.
-/+paid commercial use. i don't use them commercially so i'm not the right person to talk about it.
-/+Generation speed is the same as SDXL if you use SDXL resolutions, i heard it can do 512x512.
-Very bad at different human poses like all base models from any source/company.
*Wish for 4/8 steps Loras.
Being less resource hungry is a win both for end users and professional model trainers.
Either way, Thank you for the model and thank you the beautiful T2I community.
use "dpm_2" sampler and "normal" scheduler can get high quality result, but time cost double
It sucks of course, but for now this is all we have, beside SDXL which is extremely bad at understanding prompt. Probably it still has good potential. Hopefully it will become better than SDXL eventually.
maybe it's not even a bad thing that this model always crashes when loaded in a1111. maybe someone tries to tell me that i should wait for a stable merge that can do better....
'model.diffusion_model.input_blocks.0.0.weight'???
I couldn't find any explanation about the difference between models and VRAM requirements. The 10GB Version; How much VRAM would be needed to make a W:1400 H:1200 image?
Hey guys does it can run on Fooocus web ui??? cause in colab it shows --2024-06-13 06:59:46-- https://civitai.com/api/download/models/552771?type=Model Resolving civitai.com (civitai.com)... 172.67.12.143, 104.22.19.237, 104.22.18.237, ... Connecting to civitai.com (civitai.com)|172.67.12.143|:443... connected. HTTP request sent, awaiting response... 401 Unauthorized Username/Password Authentication Failed.
Excellent. Right now it's a novelty for sure but when we get models trained against it, I can't wait to see what people can do.
Why are Civitai filtering out R rated stuff on this model page? is it at the request of the model makers? Even more censorship!
I'm fucking dying here of laughter... love all the meme images! 🤣
Lets be clear. Only reason that Stable diffusion is popular because of NSFW and Anime / comic books. SD 3 has really hard restrictions about NSFW than other SD releases. Also, It has non-commercial licenses. It is easy to break it though without any legal consequences. Yet, i don't like these restrictions at all.
[SD3 Autism Mix waiting room]
"woman lying on the grass"
(͡ ° ͜ʖ ͡ °)
Now lets be clear.
- It is slower then any model.
- It is heavily censored.
- It produces lots of errors like older models.
- It's UI is a terrible experience
Why does anyone want to support this abomination. It is just not good. And for a minimum membership fee (that's even restricted) of 25$ a month, Stable AI just wont make it.
This is all very sad as I am a huge supporter. But now I have to fear for the future.
banning nsfw just friendly-fired human poses in sfw scenarios.
Cool, when can we remix stuff?
Is the release date for the large and huge model known?
будет работать на фокусе ?
From what i saw,SDXL is much better for now.You have many good finetunes with no NSFW restrictions,SD3 is freaking slow,especially if you have RTX with only 8GB vram,if you have RTX4090 then its not a problem.
Just use SDXL for lets say 6 months and wait for good finetunes and better optimization.When SDXL was out you need 32GB of ram and 12GB of vram gpu minimum to run it.Now with WEB UI Forge or Foocus you can run it very fast on 8GB gpu and only 16GB of ram.
Optimization would come for sure.
I got SD3 working on my laptop without much difficulty. Seems a bit better than SDXL on first release and a little faster on my 3070 TI with 8 GB.
For the non comfyUi User, there seems also a self hosted (offical) WebUi from StabilityAI which supports SD3. It is more Automatic1111 like: StableSwarmUI (beta):
https://github.com/Stability-AI/StableSwarmUI
In lack of time i can try it in 2 weeks. May someone has some earlier feedback / experience?!
I think SD3 is taking SD2 place in insignificancy. Keep on SDXL!
how work LoRA(sd3) on SD3 ? not working for me
I don't think this is a bad model at all, I mean, obviously it has problems in the structure of the human body, as someone already said on Reddit this it's not censorship of the dataset but we are talking about censorship of concepts like "Laying" which drive the model crazy. I see that many compare this version to 2.0, but to me this model reminds me a lot the very first version of 1.5 when CivitAI didn't yet exist, I remember exactly that model, it had serious problems with images of people too, but look now, where 1.5 has reached today! This model is undoubtedly the future, it has lots of details especially in the landscape images where it destroys the XL, it also supports anime natively (unlike 1.5), it also works with all kind resolutions even the low ones. But there is a looooott of work to be done, the question: how long and how much can this model be improved by the community? I think it can, but it will take a long time, we must have patience!
Remember SDXL came out? Everyone said it sucks. Skin to plastic, model failed its promise in rendering..etc. Look where we are now -- SDXL rocking almost as 1.5 ( 1.5 been out a hell of lot longer so more models, better models -- etc). I still am open to this SD3. Nobody forcing us not to use SDXL or 1.5 are they? I wait for better fine tuned models, better NSFW support , better gen times -- all will come eventually -- . SD3 hold a lot of potential, don't short change it -- let the brains of the creative model generators go at it. Still way to early to say with a cetainty this model is going nowhere.
What a joke of a model. I can't believe they released SD3 in this state. Killed any remaining credibility they had with the community, probably will end up killing the company itself.
SD3, in use, can produce some really nice results. Until you ask it to make a person. And then it craps all over the floor and produces squid monsters.
sd3 ?.. = ok it's a bit better version of sd1.5 .
Me:
SD3.0 render a woman in a crop top.
SD3.0:
😮😮😮😮😮😮😮😮 you some kinda pervert or what???????????
ok, although the first impression was bad and I made some bad comments, now that I'm getting the hang of this model, I have to say that this is a truely amazing model!
ofc we still have sdxl and all of its amazing finetunes. nobody is taking that away from us.
and I think we owe stabilityai a "thank you" for providing us with all these amazing SD models for free! (or we'd be stuck with midjourney)
As long as the SD3 doesn’t run in at least 10 seconds with a Hyper Lora on my 6GB Vram NVIDIA , this model will be useless to me. By the way I run SD 1.5 in 3 seconds and SDXL in 10 seconds.
Victory, Gracie))
Just add "artstation" to the prompt:
Too heavily censored! Even with a workflow combining SD3 with a SD1.5 nsfw model (realhugebreastmix) doesnt show a single nipple. Check the gallery for more..
I get "TypeError: 'NoneType' object is not iterable" When trying to use this model
Get error: AttributeError: 'NoneType' object has no attribute 'lowvram'
When the devs blame the users for why the images looks like shit. Yeah, the devs are doing that.
unbelievable, they trained the model to not show nipples.what is wrong with these people?
Dear diary: today there was no booba, but the hands got extra fingers
What's the difference between T5XXL e4m3fn and T5XXL - FP16?
Why do the perms say you can't sell images generated with the model?
As far as I could tell the new license doesn't forbid that.
So when does the real SD 3 come out....you know, the one with tiddies?
We have no boobs (yet) but this thing is a meme factory
I own an RTX 4090 and I want to help our men of culture train some nipples to make them less stress. Is OneTrainer good here?
"Right out of the box", SDXL also had issues with bodies, generation time and so on. But thing is, great people of the community dedicate their time and resources to finetuning, scripts etc. which improved the model towards perfection. It didn't take long before SDXL was superior to SD 1.5 in that context.
This isn't the end of a book, but a new chapter. Things seems to move fast in general, but we are still just at the dawn of AI. So, thanks to everyone contributing in these exciting days.
Observations so far:
Biggest pro to me is dynamic color ranges. I find myself prompting to tone down the colors. Biggest negative to me is the inability to achieve posing of people without wacky results. Laying, sitting, etc. creates bad results. Hand reproduction also is underwhelming on the average - I thought it was supposed to be much improved?
Text in images is much better. After I found a prompting guide, the ability to isolate backgrounds and objects is vastly improved.
For those wanting to trash it entirely, I would suggest looking at the base SDXL model again for comparison. It has come so far in the 9 months I have been doing AI generation. Give this base 9 months of community training and I expect similar improvement curve. Fingers crossed.
Is Text Encoder T5XXLFP16 the entire thing or do I need to grab that PLUS the SD3 Medium incl clips? I don't mind downloading ~20GB, I'm just new and the quickstart didn't clarify if the TE had the model with it or if that's separate. Thank you all
It plays it so safe. Like running in Meta Imagine or Edge's neutered DALLE-3. Anything creative gets brought to simple photorealistic people standing around. Wait for a trained model as this has problems even with copyrighted material being altered. Colors are great, but anything complicated just have adults sitting around like it's a clothes ad in a magazine? Nah.
Tried it for some hours,
Art and painting is awesome,
Text is more readable now,
Teeth is better
Decent quality on low resolutions
Hands and feet are the same bad,
Poor sharpnes in first pass (before upscale), too much contrast, too much darkness, not all samplers works, bad bodies,
do not adhere negative prompt well.
"angel back view" still draws wings behind character
"broken vase" is not broken
eye colors random mostly grey, and green despite
Sdxl 0.9 by SAI looked more pleasant without any finetuning, at the release. it looks like this SD3 needs a lot.
i've got overburned overcontrasted images even with low cfg.
AI artifacts like as if it some kind of LCM or Lightning, Overall look is like it is not a brand new sd3 but as it is a quick lora trained and applied with too high influence number.
For me it looks like a hybrid betwen sd1.5 and 2.0
P.S. Cant run bigger model because module 'torch' has no attribute 'float8_e4m3fn' must be something in my current installation.
Hands generation looks even worse than in SD1.5. First time I have seen hands with 3 fingers...
SD 3 is a shot in the foot, they decided to ruin everything that made sd so popular, added censorship and licence to use.
Someone can tell me how to use it on Fooocus as a Checkpoint? I've tried but doesn't work
Wow, this is the shittiest license I've ever seen.
I am using from last one day. In SD3 text is gedtting readable now but hands , fingers are getting ruin. I have given proper promt for person but still trying to look line anime poster. Also not happy with licence policy.
Do we need a different upscaler for this because dayum it f**ks up big time, see posts
DISAPPOINTED :( WHY AM i GETTING THIS?
Error occurred when executing TripleCLIPLoader: Error while deserializing header: InvalidHeaderDeserialization File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "D:\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_sd3.py", line 21, in load_clip clip = comfy.sd.load_clip(ckpt_paths=[clip_path1, clip_path2, clip_path3], embedding_directory=folder_paths.get_folder_paths("embeddings")) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 378, in load_clip clip_data.append(comfy.utils.load_torch_file(p, safe_load=True)) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\utils.py", line 14, in load_torch_file sd = safetensors.torch.load_file(ckpt, device=device.type) File "D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\safetensors\torch.py", line 308, in load_file with safe_open(filename, framework="pt", device=device) as f:
Did anyone notice something funny and interesting?
If there is image posted that is high quality, no deformities, perfect composition, etc.... people hide prompt like snake its legs ;)
I mean, i first don't include prompts, but, c'mon, this is new model, learning curve and all, so, if someone figure out how to do something that really stands out....why not share with rest of the class?
Stable Diffusion 3 is garbage. As is Stability AI
ALWAYS SAY NO TO CENSORSHIP IN ANY FORM IT TAKES!
Is this a joke? Prank? It is not even April!
Disappointed, Very Disappointed, VERY Very Disappointed,
SD3 is completely regressing when it comes to drawing human bodies
1 billion pictures only for you to fuck it all up, way to go.
The people on Civit always make me lol.
Genuinely insane how they go through this same cycle every time a new BASE model is released. Yall said the same thing about SDXL originally before people fine tuned it. Same with Turbo.
Now bring in the hate from the gooners who have clearly never seen a pair and just wana generate titties.
What did go wrong. How is it so bad. and so safe and censored. Very disappointed.
Let's hope that SD3 8B is not this bad...
Shoulda been named SDTG where TG stands for Text Generator. It's garbage. this model is prone to AI strokes the moment you go bit complex with prompts. SDXL & SD1.5 generate the same prompts with ease & high accuracy. SD3 has been a regression.
It doesnt know much on photography (real one). Can't do double exposures, wet plate reproductions, color cross processing and many more. It is clear confrmation that it has been trained on AI pics - this is why it is good at non photographic stuff. I am wondering how much fine tuning it should get to at least catch up with what SC can do in this regards. I am afraid it will always produce photography as plastic candys...
its good but also bad, I hope the community can smash the bad out of it :)
SD3 is just worse Stable Cascade
about fine tuning better version by the community , imo it will take about 1 month and we will get another shit model or even shittier fine tuned from this original crap , that one appreciated person from community will made and share and people will still complaining about this shit till an announcement of new version of stable diffusion that also will be more crappier till they end the game about that image generative abilities. if they want to do really good model or even amazing they can do and share but they won't , they do share only uncompleted limited ability models ..and guess ..that results that community made over sdxl and sd 1.5 despite it's still not perfect and not even close to reality ..they decided to not give the community more except some shit. maybe someday these models will be as perfect or as real as real images , but when no one knows or maybe never ?
WHEN CAN I USE THIS ON WEBUI?
Not able to use this model with OpenVino. Crashes the server on Automatic1111. Disabling OpenVino lets me generate images. Not impressed with the quality thus far.
P.S. Now it Automatic1111 is kicking out the model and loading Juggernaut instead.
I find it pretty entertaining that people will take the time to download partial requirements for SD3, have it fail, and take the time to complain in the comments without ever taking 30 seconds to read the fucking quickstart guide...
i hope for a sd 3 pony version ^^
This model proves once again, that Stability AI are just hacks with access to compute.
So far ALL their base SD models were hot garbage and ALWAYS required community/someone else to fix them. But at least SAI got us a free base to build and improve upon.
The license of SD3 however makes the unfucking of the SD3 for general public / local use impossible, which dooms this model from the get go.
Such a disappointment :( Hopefully it's trainable, but I don't think it is.... back to SDXL and it only vaguely following the prompts.
Guys just give it time! I'm sure eventually it will go to the same place SD 2 went. But such progress can't be done immediately, SD 2 wasn't completely there right after release. It takes time, but we'll be there.
Why would I use sd3 with such a strict license??? sd3 is a BIG NO
Despite the horrific anatomical mishap and the kinda crappy license, the model itself is very impressive overall
Details
Files
stableDiffusion3SD3_sd3MediumInclT5XXL.safetensors
Mirrors
stableDiffusion3SD3_sd3MediumInclT5XXL.safetensors
stableDiffusion3SD3_sd3MediumInclT5XXL.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3wclip.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
stableDiffusion3SD3_sd3MediumInclT5XXL.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3clipst5fp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
sd3_medium_incl_clips_t5xxlfp8.safetensors
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.