NatViS: Natural Vision - CivArchive (CivitAI Archive)

NatViS: Natural Vision - v2.0

NSFW

Please Read Description

NatViS (Natural Vision) is a photorealistic full-parameter fine-tune of SDXL that uses Natural Language prompting to generate high quality SFW/NSFW images. Trained on 1M+ image-caption pairs on a dataset that’s been expanded and refined for over a year.

v3.0 is being rebuilt from the ground up to expand the knowledge domain and improve text-image alignment across various prompting styles.

Curent v3.0 Status: Data Procurement

As of right now I can only work on the update in my spare time so there's no planned release date.

Please message me on Ko-Fi (bellow) to give feedback and suggestions. Email and public discord will be up soon!

Buy me a coffee ❤

https://ko-fi.com/ndimensional

I’ve never been a fan of e-begging, however SDXL fine-tunes at this scale are becoming expensive to tune. So I will begrudgingly ask; if you like what I do and would like to support my models. Consider donating on Ko-Fi 💗
I will be begin posting updates, answering questions, taking feedback, and releasing early access (NOT EXCLUSIVE) models to supporters.

All donations will be used to fund the creation of new Stable Diffusion fine-tunes and open-source AI tools.

Changelog

============

11-24-24 NatViS v2.7 Hyper 4Step and link for 4step Lightning (🤗)

Uploaded 4step Hyper varient of NatViS v2.7. See About this version for more info.
Lightning: 4step Lightning varient of v2.7 can be found HERE for the time being. 8step Lightning will be uploaded within a day of writing.
Note: Sample images are limited because of time constraints.

============

11-21-24 NatViS v2.7 Hyper 8step

Released 8step Hyper varient of NatViS v2.7 with consistant CFG. See About this version for more info.

11-18-24 NatViS v2.7

Due to time constraints, pre-release changelog can be viewed HERE for the time being.
Note: I was bored generating the same sample images over-and-over again and decided to spice things up with some new prompts. Prompts from previous versions will work with v2.7. When I have time, I'll upload a separate gallery for images generated with the old prompts.

============

10-26-24 NatViS v2.5 Lightning 4step (Not Recommended!):

Uploaded 4step Lightning version of NatViS 2.5
ONLY USE IF NEEDED

============

10-25-24 NatViS v2.5 Lightning 8step

Released 8step Lightning version of NatViS v2.5. Read About this version
- Note: Unlike my previous 8step lightning releases; this version is a simple merge with the SDXL Lightning LoRA. I did this due to requests for low CFG.
  - Sample images may not be the best representation of the model as a result of me not fully understanding the quirks of Lightning.
- I will be releasing the FULL CFG 8step lightning version as well, since it appears to preserve more of the fine-grained features from the fine-tune.

============

10_23_24 NatViS v2.5

What's New?

Uploaded NatViS v2.5
- Updates to text-encoder(s) to reintroduce tag/booru-style prompting capabilities that were broken in v2.0
- Subset of data included from new (improved) dataset, specifically image-caption pairs with short n' punchy captions.
  - Info on new dataset (for future models/update): Includes more variation of caption styles and all automation is manually verified by a human (i.e., me).
- Introduced more analog photography and classic cinematic film image data to further the push for more authentic realism.

What's Next?

General: ~~Review SD3.5 license to see if it's worth touching.~~
- It's not terrible. Will start research into models architecture for fine-tuning/LoRA.
General: Release Anti-Pony Alpha model (Anime, Digital Illustrations).
- In advance, it's not nearly as robust as Pony. This is a test to see if there's enough interest in the idea to pursue crowd funding for training.
- Trained with character knowledge and quality in-mind, novel booru+ tagging system & natural language prompting, multiple styles/mediums, artist knowledge, no silly quality ranking tags, SDXL compatible (i.e., not overfit and broken)
- More info will come out soon.
NatViS: Release of Lightning variants for NatViS v2.5.
- Done more effectively this time.
NatViS: Finally getting around to creating, and releasing a PDF guide.
NatViS: Continue fine-tuning of v3.0.

============

10_2_24 NatViS v2.0 Lightning 4step

Uploaded 4step lightning model for v2.0

============

10-1-24 NatViS v2.0 Lightning 8step

Uploaded 8step lightning models for v2.0

============

9-25-24 NatViS v2.0

What's New?

Prompting: This update focuses primarily on the text-encoders. Natural language prompting capabilities have been improved to follow less-strict formats and relies less on using specific tokens.
Ethnicity and Demonym: Increased accuracy of phenotypes for various ethnicities and demonyms. Not just limited to body structure, but also includes clothing, hair, landscapes, ect.. See here for small examples.
Camera EXIF: Inclusion of Camera EXIF data for popular modern and analog cameras that can be prompted. Includes, Camera Name, Focal Length, f-stop, ISO, shutter speed, lens type. Also includes attachments such as ND filters, polarizers.
Analog: Improvements to analog and vintage photograph generations.
Lighting and shadow: Prompt how light (or thereof) interacts with objects/subjects in the scene. Amongst other general lighting related modifiers. More info soon.
Skin Textures: Small improvements to the detail of skin textures with less or no explicit token related to skin detail.
Implementation of Pseudo Instruction: This will require a more lengthy write-up.
Better male anatomy.
Lesbians.

What's Next?

Lightning models will be released within the coming days.
Full PDF guide and documentation within the next week.
Info on v3.0 within the next month.

8/4/24 NatViS v1.0 Lightning 4step

Uploaded 4step lightning version of v1.0 (See About this version for more info).

============

8/3/24 NatViS v1.0 Lightning 8step

Uploaded 8step lightning version of v1.0 (See About this version for more info)

============

8/2/24 NatViS v1.0

Initial Release

Usage Tips

Note: These are simply recommendations, feel free to experiment.

Prompting

NatViS leverages SDXL’s bigG text-encoder to allow for Natural Language prompting.

What is Natural Language Prompting?
Since the release of Stable Diffusion v1.4 — people have become accustom to comma delimited lists of visually descriptive tags/phrases. This was a necessity for early Stable Diffusion models due to the architecture and choice of text-encoder. With SDXL’s dual text-encoder/tokenizer architecture we are able to write more naturally descriptive prompts.

Simply describe the image you want to generate, just as you would describe the image to a person.

For example;
Comma delimited list: a woman, standing, outdoors, sun beams, dappled light, apple tree, wearing denim jeans, flannel shirt, brown hair, long hair, looking at viewer, highest quality, atmospheric, 35mm, masterpiece

Natural Language: A masterpiece, 35mm-style photo of a woman with long brown hair, standing outdoors in dappled sunlight beneath an apple tree. She wears denim jeans and a flannel shirt, gazing directly at the viewer with an atmospheric quality.

Note: This is just an example to highlight how to write a natural language prompt. For better examples, see the sample images.

Will NatViS Understand Everything I tell it?
Absolutely, not.
Due to various limitations in both the architecture and size of the data I’m able to fine-tune as one person. There will be instances where the model will simply not generate what you want. Often, you experiment with different wording, placement of tokens (i.e., moving a sentence or individual token closer to the start or end of a prompt), remove potentially conflicting tokens, ect… Their really is no definitive solution I can, as it varies from prompt-to-prompt. Unfortunately there will times when no solution/workaround is successful.

Can I still use Tags?
Short answer: Yes
SDXL’s dual text-encoder/tokenizer architecture can process tokens/sequences with both encoders in parallel. Meaning, you don’t have to use natural language prompting.

Note: Since the training data was purely captioned with Natural Language descriptions, not all the common descriptive tags people are familiar with will be understood by the model. Especially Booru, Booru-style tags.

I found a hybrid system works well, as seen in many of the sample images.

For example;
Say you tried your natural language prompt, but want to make the results a bit more cinematic. Instead of modifying the entire prompt; you can simply append cinematic lighting, harmonious, film still, ect.. To the end of your prompt.

Quality Tags/Classifiers? (score_up_x)
Blasphemy.
You can use quality rank/classifiers if you want. But they will not part of the training data.

Negative Prompt
Similar to other SDXL models. Use tags separated with commas and keep it short. Add/Remove tokens from the negative prompt as needed.

Generation Parameters

CFG:

Recommended: 5-7
7+ to enforce a specific style/medium

Sampler/Sampling Steps:
This can be quite subjective, so I will just share what I typically use instead of giving direct recommendations.

Sampler - DPM++ 2M SDE
Scheduler - Karras
Steps - 55

ADetailer: (Extension)
Link
Again, subjective so I’ll just share my settings.

Model - mediapipe_face_full (use mediapipe for photorealism)
Confidence - 0.45
Everything else is default.

CFG Rescale: (Extension)
Link
I forgot that I had this installed, I’m not quite sure if it was enforcing the zero terminal SNR to the noise schedule or not. Since the parameter was null, it shouldn’t have.

Phi - 0

Important

If you struggle to replicate the sample images, even with the exact seed and parameters. It’s likely because of the noise scheduler. I enabled the fix for this in Webui, but had since reinstalled webui and forgot to re-enable it. This only applies to V1 of NatViS.

Training Info

TO-DO
This will take a while to write up. So in the meantime:
TLDR; 1M+ images, processed/cleaned via personal Dataset Toolkit I’m developing, captioned via Multimodal Large Language Model (MLLM) with unified feature space (part of Dataset Toolkit, not GPT). Training Data, Configs, Custom Scripts will be made available and open-sourced when the final version is released. Dataset Toolkit has no announced release date.

Check out my other models

SDXL Checkpoints: https://civarchive.com/collections/966964

SDXL LoRAs: https://civarchive.com/collections/966969

40K Series: https://civarchive.com/collections/956187

SD1.5 Checkpoints: https://civarchive.com/collections/966974

SD1.5 LoRAs: https://civarchive.com/collections/966972

Run On TensorArt (v1)

🤗Huggingface Repo

🤗Huggingface Repo - Lightning

🤗Huggingface Repo - Hyper

Description

V2.0

SEE CHANGELOG

FAQ

Comments (23)

watchvideo321Sep 25, 2024

CivitAI

I'm getting only noises, anyone knows why?

geoffsmith101821Sep 26, 2024· 2 reactions

Are you including a VAE? Looks like it isn't baked into this model

mefekan639250Oct 10, 2024

@geoffsmith101821 including it bro,I'm still getting noises during inpainting.idk what am I doing wrong.. steps schedular..all is correct..

vanillahSep 26, 2024· 4 reactions

CivitAI

EDIT 1: CULPRIT WAS FORGE! If you're getting very bad results and do not know why. TRY A1111.

EDIT 2: Model is still hard to use, tho. Especially eyes and hands are problem for me. Can't seem to grasp the quality I was getting with V1.

hmm, I'm sure it's just my lack of skill in utilizing the V2 model, but so far I'm struggling to create anything usable. Unlike v1 where I could just utilize tags, V2 is very hard to work with. I'm just probably missing something here...

ndimensional

Author

Sep 26, 2024· 1 reaction

That's due to v2.0 focusing on text-encoder training with natural language captions. It can still generate everything v1.0 did, and more. But will require a more natural language style prompt.

NatViS was always intended to be an experiment with natural language prompting. To give more user fine-grain control over the output, but I'm aware of the ease-of-us of using tags so they'll be reintroduced in v3.0. 👍

vanillahSep 26, 2024· 2 reactions

@ndimensional Thanks for reply but i the issue was technical as it was getting artifacts with samplers you used! Even when using your prompt and settings, i was getting very bad results. The culprit?? It was FORGE. Both new forge and old forge or its augments are SOMEHOW incompatible with this model. As soon as i began using regular 'ol A1111 i was getting better results but still v1 was above. I still want to investigate more but no time unfortunately =(

onionhulu781Oct 4, 2024

I can second this, model does not play nicely with Forge and produces awful images with tons of artifacts. Just tried in Fooocus and results are fantastic even without the fooocus enhancements

ConqueeftadorSep 26, 2024· 2 reactions

CivitAI

This model is absolutely outstanding. Thank you!

ZomblexSep 26, 2024· 3 reactions

CivitAI

Amazing model yet again!! THANKS!!! Posted some photos too, they look so natural without any lora.

alternative_UniverseSep 26, 2024

CivitAI

This model is just next level, but v2 seems to have a hard time recognizing characters lora

ndimensional

Author

Sep 26, 2024

Thanks for reminding me of that.
I too noticed some unstableness with LoRA's in V2.0
My theory is that it's almost defiantly a result of the text-encoder training — causing a rank token bias and semantic shift in the text-encoder network layers. Where as v1.0 had a mix of tag-style captions and natural language captions; v2.0 was almost entirely natural language; due to an error with the original v2.0 update data causing me to pull a good chunk of it to debug/fix for v3.0.

alternative_UniverseSep 26, 2024· 1 reaction

@ndimensional glad you get me, so you, you will fix this for version 2.5 or straight in V3?

ndimensional

Author

Sep 26, 2024· 1 reaction

@P_Universe No problem👍 . I'll see what I can do. If I can implement it in a quick/cheap training session I'll create a 2.5.

alternative_UniverseSep 26, 2024

@ndimensional awesome!!, will you update the model on tensor art as well?

35579Sep 26, 2024· 11 reactions

CivitAI

idk bro previous one was way better, way more realistic and better quality. Now its just make complete mess with hands and faces with simple prompts

vanillahSep 26, 2024· 3 reactions

Yep, I'm seeing this trend too, V1 is so easy to work with while you need to actually coax v2 to do basic things.

alternative_UniverseSep 27, 2024· 2 reactions

I have noticed this also now that I am doing more tests, v1 seems to understand the prompt better

nittygritty11Sep 27, 2024· 3 reactions

The biggest problem I've had compared to V1 is the default girls all have big noses and beady eyes now and no amount of logical prompting or negatives can do much fix it.

StreamofStarsSep 27, 2024· 3 reactions

Yes, something is up with it. And I am not even using it for porn. But it seems unstable. text encoding? overtrained with some error?

3567304Sep 27, 2024· 2 reactions

CivitAI

G.O.A.T model

vegslaSep 29, 2024· 1 reaction

CivitAI

Best checkpoint. Anyone know why I get blurry skin though? Everyone seems to be hyping the quality but I end up having to use a refiner like realvis to get nice sharp skin.

Kitten123Sep 29, 2024

CivitAI

What type do samplers are used for 4 step model

TrueToLife_FauxtoSep 30, 2024

Personally, I found with 4-8 step lightning LoRAs, Euler A Simple, or DPM ++ SDE SGM Uniform works well on A1111.

Checkpoint

SDXL 1.0

by ndimensional

Download (Beta) View on CivitAI

science fiction

scifi

sci-fi