Please Read Description
NatViS (Natural Vision) is a photorealistic full-parameter fine-tune of SDXL that uses Natural Language prompting to generate high quality SFW/NSFW images. Trained on 1M+ image-caption pairs on a dataset that’s been expanded and refined for over a year.
v3.0 is being rebuilt from the ground up to expand the knowledge domain and improve text-image alignment across various prompting styles.
Curent v3.0 Status: Data Procurement
As of right now I can only work on the update in my spare time so there's no planned release date.
Please message me on Ko-Fi (bellow) to give feedback and suggestions. Email and public discord will be up soon!
Buy me a coffee ❤
https://ko-fi.com/ndimensional
I’ve never been a fan of e-begging, however SDXL fine-tunes at this scale are becoming expensive to tune. So I will begrudgingly ask; if you like what I do and would like to support my models. Consider donating on Ko-Fi 💗
I will be begin posting updates, answering questions, taking feedback, and releasing early access (NOT EXCLUSIVE) models to supporters.
All donations will be used to fund the creation of new Stable Diffusion fine-tunes and open-source AI tools.
Changelog
============
11-24-24 NatViS v2.7 Hyper 4Step and link for 4step Lightning (🤗)
Uploaded 4step Hyper varient of NatViS v2.7. See About this version for more info.
Lightning: 4step Lightning varient of v2.7 can be found HERE for the time being. 8step Lightning will be uploaded within a day of writing.
Note: Sample images are limited because of time constraints.
============
11-21-24 NatViS v2.7 Hyper 8step
Released 8step Hyper varient of NatViS v2.7 with consistant CFG. See About this version for more info.
11-18-24 NatViS v2.7
Due to time constraints, pre-release changelog can be viewed HERE for the time being.
Note: I was bored generating the same sample images over-and-over again and decided to spice things up with some new prompts. Prompts from previous versions will work with v2.7. When I have time, I'll upload a separate gallery for images generated with the old prompts.
============
10-26-24 NatViS v2.5 Lightning 4step (Not Recommended!):
Uploaded 4step Lightning version of NatViS 2.5
ONLY USE IF NEEDED
============
10-25-24 NatViS v2.5 Lightning 8step
Released 8step Lightning version of NatViS v2.5. Read About this version
Note: Unlike my previous 8step lightning releases; this version is a simple merge with the SDXL Lightning LoRA. I did this due to requests for low CFG.
Sample images may not be the best representation of the model as a result of me not fully understanding the quirks of Lightning.
I will be releasing the FULL CFG 8step lightning version as well, since it appears to preserve more of the fine-grained features from the fine-tune.
============
10_23_24 NatViS v2.5
What's New?
Uploaded NatViS v2.5
Updates to text-encoder(s) to reintroduce tag/booru-style prompting capabilities that were broken in v2.0
Subset of data included from new (improved) dataset, specifically image-caption pairs with short n' punchy captions.
Info on new dataset (for future models/update): Includes more variation of caption styles and all automation is manually verified by a human (i.e., me).
Introduced more analog photography and classic cinematic film image data to further the push for more authentic realism.
What's Next?
General:
Review SD3.5 license to see if it's worth touching.It's not terrible. Will start research into models architecture for fine-tuning/LoRA.
General: Release Anti-Pony Alpha model (Anime, Digital Illustrations).
In advance, it's not nearly as robust as Pony. This is a test to see if there's enough interest in the idea to pursue crowd funding for training.
Trained with character knowledge and quality in-mind, novel booru+ tagging system & natural language prompting, multiple styles/mediums, artist knowledge, no silly quality ranking tags, SDXL compatible (i.e., not overfit and broken)
More info will come out soon.
NatViS: Release of Lightning variants for NatViS v2.5.
Done more effectively this time.
NatViS: Finally getting around to creating, and releasing a PDF guide.
NatViS: Continue fine-tuning of v3.0.
============
10_2_24 NatViS v2.0 Lightning 4step
Uploaded 4step lightning model for v2.0
============
10-1-24 NatViS v2.0 Lightning 8step
Uploaded 8step lightning models for v2.0
============
9-25-24 NatViS v2.0
What's New?
Prompting: This update focuses primarily on the text-encoders. Natural language prompting capabilities have been improved to follow less-strict formats and relies less on using specific tokens.
Ethnicity and Demonym: Increased accuracy of phenotypes for various ethnicities and demonyms. Not just limited to body structure, but also includes clothing, hair, landscapes, ect.. See here for small examples.
Camera EXIF: Inclusion of Camera EXIF data for popular modern and analog cameras that can be prompted. Includes, Camera Name, Focal Length, f-stop, ISO, shutter speed, lens type. Also includes attachments such as ND filters, polarizers.
Analog: Improvements to analog and vintage photograph generations.
Lighting and shadow: Prompt how light (or thereof) interacts with objects/subjects in the scene. Amongst other general lighting related modifiers. More info soon.
Skin Textures: Small improvements to the detail of skin textures with less or no explicit token related to skin detail.
Implementation of Pseudo Instruction: This will require a more lengthy write-up.
Better male anatomy.
Lesbians.
What's Next?
Lightning models will be released within the coming days.
Full PDF guide and documentation within the next week.
Info on v3.0 within the next month.
8/4/24 NatViS v1.0 Lightning 4step
Uploaded 4step lightning version of v1.0 (See About this version for more info).
============
8/3/24 NatViS v1.0 Lightning 8step
Uploaded 8step lightning version of v1.0 (See About this version for more info)
============
8/2/24 NatViS v1.0
Initial Release
Usage Tips
Note: These are simply recommendations, feel free to experiment.
Prompting
NatViS leverages SDXL’s bigG text-encoder to allow for Natural Language prompting.
What is Natural Language Prompting?
Since the release of Stable Diffusion v1.4 — people have become accustom to comma delimited lists of visually descriptive tags/phrases. This was a necessity for early Stable Diffusion models due to the architecture and choice of text-encoder. With SDXL’s dual text-encoder/tokenizer architecture we are able to write more naturally descriptive prompts.
Simply describe the image you want to generate, just as you would describe the image to a person.
For example;
Comma delimited list: a woman, standing, outdoors, sun beams, dappled light, apple tree, wearing denim jeans, flannel shirt, brown hair, long hair, looking at viewer, highest quality, atmospheric, 35mm, masterpiece
Natural Language: A masterpiece, 35mm-style photo of a woman with long brown hair, standing outdoors in dappled sunlight beneath an apple tree. She wears denim jeans and a flannel shirt, gazing directly at the viewer with an atmospheric quality.
Note: This is just an example to highlight how to write a natural language prompt. For better examples, see the sample images.
Will NatViS Understand Everything I tell it?
Absolutely, not.
Due to various limitations in both the architecture and size of the data I’m able to fine-tune as one person. There will be instances where the model will simply not generate what you want. Often, you experiment with different wording, placement of tokens (i.e., moving a sentence or individual token closer to the start or end of a prompt), remove potentially conflicting tokens, ect… Their really is no definitive solution I can, as it varies from prompt-to-prompt. Unfortunately there will times when no solution/workaround is successful.
Can I still use Tags?
Short answer: Yes
SDXL’s dual text-encoder/tokenizer architecture can process tokens/sequences with both encoders in parallel. Meaning, you don’t have to use natural language prompting.
Note: Since the training data was purely captioned with Natural Language descriptions, not all the common descriptive tags people are familiar with will be understood by the model. Especially Booru, Booru-style tags.
I found a hybrid system works well, as seen in many of the sample images.
For example;
Say you tried your natural language prompt, but want to make the results a bit more cinematic. Instead of modifying the entire prompt; you can simply append cinematic lighting, harmonious, film still, ect.. To the end of your prompt.
Quality Tags/Classifiers? (score_up_x)
Blasphemy.
You can use quality rank/classifiers if you want. But they will not part of the training data.
Negative Prompt
Similar to other SDXL models. Use tags separated with commas and keep it short. Add/Remove tokens from the negative prompt as needed.
Generation Parameters
CFG:
Recommended: 5-7
7+ to enforce a specific style/medium
Sampler/Sampling Steps:
This can be quite subjective, so I will just share what I typically use instead of giving direct recommendations.
Sampler - DPM++ 2M SDE
Scheduler - Karras
Steps - 55
ADetailer: (Extension)
Link
Again, subjective so I’ll just share my settings.
Model - mediapipe_face_full (use mediapipe for photorealism)
Confidence - 0.45
Everything else is default.
CFG Rescale: (Extension)
Link
I forgot that I had this installed, I’m not quite sure if it was enforcing the zero terminal SNR to the noise schedule or not. Since the parameter was null, it shouldn’t have.
Phi - 0
Important
If you struggle to replicate the sample images, even with the exact seed and parameters. It’s likely because of the noise scheduler. I enabled the fix for this in Webui, but had since reinstalled webui and forgot to re-enable it. This only applies to V1 of NatViS.
Training Info
TO-DO
This will take a while to write up. So in the meantime:
TLDR; 1M+ images, processed/cleaned via personal Dataset Toolkit I’m developing, captioned via Multimodal Large Language Model (MLLM) with unified feature space (part of Dataset Toolkit, not GPT). Training Data, Configs, Custom Scripts will be made available and open-sourced when the final version is released. Dataset Toolkit has no announced release date.
Check out my other models
SDXL Checkpoints: https://civarchive.com/collections/966964
SDXL LoRAs: https://civarchive.com/collections/966969
40K Series: https://civarchive.com/collections/956187
SD1.5 Checkpoints: https://civarchive.com/collections/966974
SD1.5 LoRAs: https://civarchive.com/collections/966972
Run On TensorArt (v1)
🤗Huggingface Repo
🤗Huggingface Repo - Lightning
🤗Huggingface Repo - Hyper
Description
4step Lightning Version of v1.0 For Fast Inference.
Recommended Parameters:
CFG — 1 -2
Steps — 4 -6
FAQ
Comments (40)
@ndimensional
Is it correct to say that in terms of realism NatViS is inferior to Clarity XL(focuses purely on photorealism)?
It's hard to say. Both models focus on photorealism, just in different ways.
Clarity XL is often more cinematic, with more dramatic lighting and vivid colors, by default.
NatViS is also capable of this, but with more work (via prompt). Defaulting to a more amateur photography aesthetic.
NatViS shares a small parts of the Clarity dataset, though with different captions.
Since you mentioned it, Clarity XL will be getting an update this month 😉
@ndimensional
Thank you for letting me know.
I thought of NatViS as the standard(natural) and interpreted [Clarity XL] ≒ [NatViS makeup].
I've never seen any finetune like this. It's next level.
in yes or no please , did your 8step model will works better than regular v1 plus adding the lightning 4 step 1cfg lora ?
I haven't tried using any of the lightning LoRA's on the non-distilled v1 model.
Technically though, yes. The 8step model should outperform using a 4step LoRA on the v1 model in terms of quality.
using 4step 1cfg lightning lora with your v1 model will decrease the quality of text encoder or overall quality ? will be major decrease or just a little ?
cant wait to see the next version!
Could you please publish a script for mixing LORA with the model? There are more interesting options for mixing, such as PCM.
Could you perhaps recommend an upscaler and denoising combo that you've tried that works for you?
I've been playing around with this some more and gents, I think this is better than ANY established SDXL NSFW photo model out there.
Prompt understanding is definitely level above. That indescribable aesthetic once you prompt for it.
I might as well just delete half of the models and loras I currently have, coz this just made them redundant.
By far the most realistic skin texture. I have no idea how you succeeded where others failed. Some came close but not as realistic as this. Congratulations!
any lose of using your model with 4steps loras other than the normal a bit lose in quality ? any prompt understanding lose ?
Yes, there's some overall loss with the 4step version. Nothing major, but there is loss. The degree to which seems to depend on the level of complexity of the prompt. When I have the time, I'm going to look into updating the 4step version using a different method.
@ndimensional then your v1.0 regular model plus adding the 4step lora manually in comfy will be the same as your 4step model at this moment ?
@amazingbeauty I haven't tried it personally. But on the discord thread linked in the models description, you can find a Comfy workflow for doing just that. Doing it manually looks like it could be better*. I can't say for certain though.
Would love to see this model merged with the Boomer Art Model (BAM!).
Interesting idea. I've been toying around with the idea of creating a ProjectAIO merge for SDXL. Something to add a buffer between model updates. ProjectAIO being a model I made for SD1.5, where I merged all my SD1.5 fine-tunes/merges into one model. It was a silly idea that created some interesting results.
btw, Boomer Art Model v3 is in the data cleansing phase atm and should start tuning within the next week 😊
@ndimensional Great news!
Merging takes only a minute, why not do it yourself? Or can't you do local
This is an incredible model. The photorealism is a huge upgrade from Pony Realism. Sometimes it seems overfit on poses and it can be hard to prompt a very specific scene, but the quality is incredibly high. If nothing else this can be a good model to switch to for adetailing faces or just to img2img with low denoising to bring more realism to your scene.
This is a wonderful model. Very good at cinematic shots.
I think the model is poorly named and hard to find. It should be a lot more popular than it is.
Even as a user of it I need to go back to my bookmarks to find it again. maybe it's just me not paying attention...
In terms of realism this is the best model I can find by a large margin. Please don't stop this project. This is incredible. Maybe a name change would be helpful and just keep updating it to keep it top of line.
A very nice and versatile model with an unique hidden strength when it comes to realism.
awesome model! what image size do you recommend? for now I have only tried in 832x1216, I get good results :)
Here you go:
Recommended Generation Dimensions:
1344x768 (16:9) — Cinematic Film Stills
1536x640 (21:9) — Ultrawide Cinematic Film Stills
1152x896 (4:3) — Fullscreen
1216x832 (3:2) — Mobile landscape
1024x1024 (1:1) — Square
1024x704 (11:16)
768x1344 (9:16) — Tall (Instagram stories / snapchat)
896x1152 (3:4)
832x1216 (2:3) — Mobile Portrait
704x1024 (16:11)
@vanillah thank you :D
Holy cow, this model delivers like none other! Absolutely terrific at generating different ethnicities, situations. Great at lighting and textures.
Right now by far the best model for creating fuzzy and fur stuff - realistic style. I did test it for some days now....and as you see on my gallery pics below, the results are stunning. The prompting follows a natural structure which first is "not what you have been used too".....but is superior to what you know. The prompt recognition is far above average. I cannot post NSFW pictures here for personal reasons...but "great" is the description what you get -> nothing less! Thanks for sharing it. It unlocks a huge amount of new options. Guys go for it, like it, post gallery pictures, support the creator.
BEst model EvEr
Thanks for this checkpoint. Good quality with few steps.
This model is being slept on/underrated. I dislike Pony a lot so Im always looking for the best realistic NSFW model and as of 8-23-24 this model is literally the best NSFW model available on CivitAI; #1. Its even gets hands/fingers almost at the accuracy level of Flux.1.
Amazing work! Looking forward to v2!
Excellent model, I prefer using DPM++ 2M SDE Exponential at cfg 4-6 and around 20-30 steps
this is the sense of light i used,mechanism,action and other best models,very good works,thanks,
My new favorite model.
Any plans for a update?
This model is amazing. It has Porn capabilities that rival BigASP but with better quality and much better Prompt adherence. I can't wait for V2







