IllumiYume XL (Illustrious) - v3.5 (v-pred)

NSFW

Introduction

For version 1.0:

This model is based on 'Illustrious XL 1.0' with some minor modifications and was trained on the Danbooru2023 along with the dataset I previously used for training my LoRA models.

For version 2.0:

This developed model is intended to allow everyone to experience the v-pred version of Illustrious XL, instead of having to spend a large amount of STARDUST to unlock the Illustrious XL v3.0 v-pred and v3.5 v-pred versions.
I independently researched and developed this version based on various existing XL model architectures. However, due to the many modifications I made, I’m not sure it can still be considered 'Illustrious XL'.

The model was trained on the danbooru2024, danbooru_newest-all datasets, as well as a custom dataset (which I collected and labeled using natural language with GPT-4.5, and later manually verified by me).
I put a lot of time and effort into developing this version, so if you don't mind, please consider bidding on it so that others can use it through the CivitAI generator. Thank you all very much!

For version 3.0:

With this version, the model was created with the purpose of adapting to as many styles as possible, while also balancing detail stability in the generated images. This model includes styles and artist styles (from Danbooru and e621).
Although it is oriented towards being a pre-trained model, you can use it normally. However, to achieve optimization, I suggest you combine it with LoRA or fine-tune it to create the style you desire.
The model was trained on the danbooru2024, danbooru_newest-all datasets, e621 as well as a custom dataset, with 40% of this data annotated using both tags and natural language.
This model is an epsilon-prediction model that can easy to use.

For version 3.1:

This version improves the issues encountered in version 3.0. In addition, it also enhances image quality related to styles and artist styles (from Danbooru and e621).
This model was trained on the same dataset as version 3.0, but I re-annotated it, added many new anime characters, and improved the quality of existing ones.
The model improves stability when generating images at a resolution of 1536x1536.
This version will have two variants: one for v-pred and one for e-pred (the e-pred version will be released first).

For version 3.2:

This model is a refined version of 3.1, incorporating hotfixes and enhancements. It features improved detailing in the eyes and more accurate anatomical proportions for the character.
Additionally, the model demonstrates enhanced creativity and a better ability to accurately understand prompts
This model is also capable of generating images at large resolutions, e.g., 1024x2048 (I tested it and found the image quality to be quite decent). (Note: during training, I only trained it with images at a resolution of 1536x1536).

For version 3.5:

This model was trained on the Danbooru dataset, updated as of May 9th, 2025, with image sizes of 1536x1536.
It fixes an important bug that appeared in version 2.0 of the v-pred variant.
The model also improves stable style, anatomy, and prompt understanding compared to the previous version.

Important Note

This is the first base model I've created, so any feedback is welcome. Feel free to share your thoughts so I can improve it in future versions.
Version 2.0 is a V-prediction model (unlike epsilon-prediction), and it requires a number of specific parameters.
Version 3.0 should be set with a low CFG value, around 2 to 4. When you encounter images generated with high contrast (I don't know why CFG affect this, i will investigate and find the solution :v)

Currently, the model is not available for use via Civitai Generation. You can visit the following website to use it:

tensor.art

Suggested settings:

All example images were generated using the following settings:

Positive prompt: masterpiece,best quality,amazing quality
Negative prompt: bad quality,worst quality,worst detail,sketch,censor, simple background,transparent background
CFG: 5-7 (For version 3.0 i suggest you should set this lower from 2-4 )
Clip skip: 2
Step: 20-30
Sampler: Euler a/DPM++ 2S a

Note: I don't use any post-processing and Lora to enhance the example images. I only use these settings and a custom prompt with my base model to generate.

Acknowledgments

Thanks to narugo1992 and Nyanko for sharing such valuable data.

If you'd like to support my work, you can do so through Ko-fi!

Description

FAQ

Comments (29)

dfijgklerhjkldghtjykghljgJun 10, 2025

CivitAI

this model, I always come back to it.👍

And233Jun 10, 2025· 1 reaction

CivitAI

pretty strange. I used to run v3.2 with AYS scheduler, but when I use AYS on v3.5, it will become very bad.

super_yakisabaJun 10, 2025

CivitAI

Great model. v3.1v-pred is the best for me as I can see a clear difference in how it responds to prompts compared to the others. Do you see any differences and why do you think they are different?

FoxyzJun 11, 2025

CivitAI

Idk if it's the parameters or what, but I couldn't get 3.5 to behave properly. It feels almost too random? The consistency of 3.2 is out of the window using same parameters. Also trying to generate burnice white, it was able to unlike the previous version, however astra yao who is Jan 2025 didn't work. I'll do more testing later and add any findings.

wtre59Jun 11, 2025· 1 reaction

CivitAI

There seems to be something wrong with v3.5...

wtre59Jun 11, 2025

Hmm...some specific tags will cause problems in the case of (xxxx:2).

duongve13112002

Author

Jun 11, 2025

@wtre59 Hi would you mind giving me some examples for that

FafNieRJun 11, 2025· 5 reactions

CivitAI

I've tried v3.5, v3.2, and v3.1 (vpred). To me, v3.2 works best among these three versions.

alexvolaJun 12, 2025· 2 reactions

CivitAI

I think I have to agree with other commenters. 3.5 def has interesting style, but 3.2 is much more stable when it comes to more complex prompt.

duongve13112002

Author

Jun 12, 2025· 2 reactions

CivitAI

Hi everyone,
I’d like to confirm that version 3.5 was trained on a newer Danbooru dataset (May 9th, 2025) compared to version 3.2. Additionally, I made some configuration changes, so this version is quite different from v3.2 and previous v-pred versions.
At the moment, I'm not entirely sure what issues may have been caused by these changes, so I’d really appreciate it if you could share any feedback or problems you’ve encountered while using version 3.5 (ideally with examples). Your input will help me investigate and improve future versions more effectively.
Thanks

dfijgklerhjkldghtjykghljgJun 12, 2025

CivitAI

I want to give my feedback base with easy common sense.

(1536x1536)/(1024x1024)=2.25, just by having 225% more data load from increased pixel count itself is something, combine with all the parameter that need to be learned/adjusted together, it is clearly way more of a load than a simple 225% increase.

the current sweet spot is still 1024px, 1536px is significantly more expensive to train, so I would suggest to all trainers to hold back for now considering 1536px training, even for a few years, the chip need to catch up, it is clear the chip we could get access to is not sufficient for the type of calculation that txt2img learning required.

as for the dataset itself I don't think it is the problem, actually I like the new dataset used in v3.5, feed it back, train again, for 1024px, things should be okay by then.

MinthybasisJun 13, 2025· 16 reactions

CivitAI

Hello, friend. I absolutely don't mind if you're using my models for merges and tunes, on the contrary, I appreciate it. But it is not nice to claim all as yours hiding who did the main things. I kindly ask you to point the used source.

QueenTidoJun 14, 2025· 11 reactions

CivitAI

Currently, I don’t know whether this model was trained from scratch or merged with Rouwei’s model. However, I’ve noticed that there’s a lot of debate and condemnation surrounding this model. From my own testing, although both models occasionally produce similar outputs, they are fundamentally different. It seems like Rouwei’s fans are actively trying to discredit this model. As of now, I haven’t seen its creator charging users anything — they’ve simply shared it with the community. I believe that learning from one another is essential for the growth of AI.

Moreover, we shouldn’t claim ownership over a model unless we are the actual creators. Here’s why:

All of these models are based on the architecture of Stable Diffusion XL 0.9 (by Stability AI), so inherently they share the same foundation.

They often use datasets like Danbooru, which are publicly accessible.

If you’ve built a model with a unique architecture or curated your own dataset, then yes, you have a claim to it. But if all you’ve done is train on existing frameworks and data, then proudly claiming ownership is meaningless.

Take DeepSeek as an example — many argue it copied ChatGPT’s knowledge, but structurally, they are different models. That distinction matters. In this case, both models use the same architecture and are open-source — yet people still criticize them.

Let’s be more respectful and civil. Because of selfish attitudes like this, most major American companies now opt for closed-source solutions. It’s mostly Chinese companies that are still open-sourcing their work for the community. So let’s appreciate the fact that we still have access to these model weights — otherwise, in the future, we’ll likely end up having to pay hefty sums just to use them.

wtre59Jun 14, 2025

CivitAI

Wow ...... Looks like there are some issues here that need to be addressed ...... With my limited knowledge it may be a bit difficult to understand the core, but based on my experience with it, IllumiYume's v3.5 is markedly different from the previous generations of models, and based on the duongve13112002 mentioned in the previous version: "I independently researched and developed this version based on various existing XL model architectures. However, due to the many modifications I made, I'm not sure it can still be considered ‘Illustrious XL’."

Is there some technique used that allows parts of the model to remain virtually unaltered during training, retaining some features of the original model?

(From a paper on a possibly related technique:
"In this section, we apply our method to a distilled model obtained through progressive distillation (PD). Notably, this implies that predictions at intermediate steps that are not explicitly part of the PD process are not trained, resulting in suboptimal performance at these steps.")

If this is possible, it may be one of the reasons for the existing shortcomings of v3. 5 existing flaws, a discussion in RouWei's comments asks why Noob is not used as a base model for training, and there is this reply there,

"base model serve as a block of clay, the clay need to be pure in substance so it can be mold into anything(meaning it has no biases), and the clay need to be complete and full with volume, otherwise when training subsequence loras/finetunes it has no existed weights/concept to be adjusted from/with cause it is not present in that particular base model",

which seems worth thinking about in light of the performance of ChromaYume, another model from duongve13112002 (which also seems to use similar techniques).

Since AI is still booming and there are so many techniques available here, I'm not sure if this is an intentional plagiarism or a problem with the techniques used,

after all, a student can actually understand the teacher's reference and make it their own or they can simply copy and paste it,
in any case, since duongve13112002 states that the model is from the RouWei of Minthybasis helped, then thanks for your work!

♥ to @Minthybasis

(Translated with DeepL)

duongve13112002

Author

Jun 14, 2025· 4 reactions

CivitAI

Hi everyone, @Minthybasis. I want to clear with everyone, First of all, I’ve applied part of the technique mentioned by @QueenTido in the paper you referred to (not at all but i customize with my own method) . Secondly, I do not claim ownership of the model or anything else, as I currently don’t have the capability to independently research and develop a model from scratch. I merely trained and fine-tuned it based on the outputs that aligned best with my goals, primarily for community development.

Thirdly, although my model may resemble Rouwei's in some aspects, if you read the paper carefully, you'll see that the method used is intended to improve the quality of the original model. To be honest, if Rouwei had not released a newer version (or if the newer model didn’t offer significantly more knowledge from the Danbooru dataset), I would have used Illustrious version 2.0 as the teacher model instead—because from the beginning, that was my intention due to its variety of beautiful styles.

After that, I trained the student model using my own dataset, freezing certain layers of the student model. Fourthly, I didn’t reference Rouwei because I believe their model is essentially based on Illustrious v0.1.

Fifth, I always try to find the most cost-effective methods that still yield optimal results, since I’m working independently without any financial support. My main goal is simply to create a model for personal use and to make it available for others as well.
Lastly, in future versions, I’ll try every way I can to further improve the model and add new knowledge. However, if I had more funding for training, I’d also want to train the model on my own dataset without relying too much on external technologies, to save costs :<
Thanks @Minthybasis to help me :D
Please try to understand and sympathize with me.

MinthybasisJun 14, 2025· 16 reactions

CivitAI

I honestly tried to smooth it over, turn it into a joke, or somehow resolve in a friendly manner so that no one loses face.

Alright, it is very simple and anyone can check it. Just encode rouwei using Base64 getting cm91d2Vp, then put it in the beginning of any prompt and generate pictures. Surprise - a cute chibi catgirl from my avatar.

Congratulations, you "trained" a checkpoint that identifies itself as something you denying direct reference!

Not only artists watermarks can be purged, but also new special introduced, than will not manifest itself in any way until it is properly called upon.

This can not be 'transferred' without direct call in conditions or taking weights. Speaking of weights - there are a lot of same values in multiple layers, many times more than it could have been because of same origin.

The only thing I can't understand - why? Of course, the desire to exaggerate one's merits in front of the audience is understandable, but the consequences of the revealed lie are not worth it.

Like I said multiple times before - you can use my models as base, for merges or something else, this is fine and mentioned in description. Just don't lie to people what you exactly did and point the source.

wtre59Jun 14, 2025

CivitAI

Honestly, I'm having some difficulty understanding duongve13112002's attitude now ......

duongve13112002 still seems to be a bit ambiguous on this issue, which is indeed puzzling.

To be sure, v3.5 does have knowledge from RouWei - cm91d2Vp, the cat lady watermark from RouWei, was generated in v3.5, so v3.5 must have ‘learnt’ RuoWei's knowledge, one way or another. By whatever means.

(v3.1/3.2 didn't have these knowledge.)

And distillation is the condensation and transfer of knowledge, according to duongve13112002's ambiguous interpretation of the original statement, v3.5 uses RouWei as the teacher model, so at least part of v3.5 is distilled from RuoWei, duongve13112002 should at least make this clear.

I'm not sure of duongve13112002's personal definition of the ‘referencing’ behaviour he denies, but at least my ‘referencing’ doesn't amount to bantering about copying. The two models we are discussing are both based on Illustrious, one probably based on Illustrious 1.0 and the other on Illustrious 0.1.

Illustrious 1.0 itself is based on Illustrious 0.1, which in turn is based on KBlueLeaf/kohaku-xl-beta5, which in turn is based on sdxl.

The model files, which both ended up being around 6gib, definitely had a lot of changes along the way, but We all know they're not the same.

Like DeepSeekR1,which has 671b original weights, and it also has 70b/32b/14b/8b/7b/1.5b distillation weights, they're not the same, but something in them - the so-called knowledge - was able to be successfully transferred from a 671b behemoth to a 1.5b miniaturised model .

Minthybasis may have wanted duongve13112002 to account for the act of ‘distilling’ from RuoWei.

Or does Minthybasis think that duongve112002 just trained a second time on the weights of the model trained by Minthybasis and classified it all as his personal training work?

(Again, I'm not likely to be clear on Minthybasis' personal definition of ‘reference’.)

To do the maths, the entire community now basically has knowledge from the work of previous people, and the datasets we use for training have knowledge from other people.

How much thanks can we actually give to the authors of that knowledge?

At least I don't think there's much to be ambiguous about; there's always someone standing on the shoulders of giants, and giants might stand on the shoulders of supergiants.

(Translated with DeepL)

QueenTidoJun 14, 2025· 2 reactions

CivitAI

I have no intention of defending anyone's stance in this dispute between @Minthybasis and @duongve13112002 . According to @Minthybasis , this model can generate chibi-style images similar to their own model. However, @duongve13112002 has shared the workflow used to create this model, and from their explanation, it seems evident that knowledge from the Rouwei model is inherently present in this one. That said, in order to objectively determine whether this model is truly different from Rouwei, we need to make a direct visual comparison.

I’ve been testing both models since this morning and found that, for many prompts, the two produce noticeably different outputs. Occasionally, the generated images may look similar at first glance, but the details within them differ significantly. I’ve uploaded comparison images, all generated using the same settings, with the only variation being the two different base models. You can view them here: https://civitai.com/posts/18275913. It's clear that the two models do not consistently produce identical images, and at times, the differences are quite pronounced.

wtre59Jun 14, 2025

This post seems to be inaccessible?

schneesturmx91988Jun 14, 2025· 7 reactions

CivitAI

i love testing models ... but sry what in the hell is this 3.5 version ? pls explain that some images turns in to a horror genre @_@

iuchihaitachi4825Apr 30, 2026

yes for me too

rantantekiJun 14, 2025· 7 reactions

CivitAI

just say it's a checkpoint merge bro you're embarrassing yourself

wtre59Jun 15, 2025· 3 reactions

CivitAI

There are two types of checkpoints on this site, ‘trained’ and ‘merged’, and I don't think anyone would be too hard on a merged model.

Labelling a model as merged conveys the attitude that my model's knowledge is largely derived from others (whether by complex or simple means), and I'm willing to admit it.

The trained model conveys a different message: more or less, a significant portion of the knowledge in that model is derived from my efforts as an author. (Whether that knowledge is added or optimised)

v3.1/3.2 is pretty good work, but it couldn't have been done without Illustrious 1.0 behind it, and even though Illustrious itself was starting to turn into a farce, Illustrious 1.0 did improve in high resolution, and they did at least release 2.0 according to the donation schedule, and to be honest, I'm not excited about Illustrious v3.5 was no longer expecting much, oh - and what a coincidence, the same farce provoking and disappointing v3.5.

The base model info for IllumiYumev 3.5 shows Illustrious , but if you dig deeper, what model is the base of it's fine tuning? I don't think it's Illustrious 1.0 ,

Would it be Illustrious 0.1 ? I don't think so.

Based on the results we got from our testing, we can actually declare that the base model for v3.5 is RouWei0.8 because it improves on RouWei0.8 - we should declare that it's based on RouWei0.8, which is based on Illustrious 0.1.

If you want to say that. Please make it clearer.

To your response, I'd like to make it a little more prominent in the comments - @oioioicola

That said, if it were actually based on Illustrious 2.0, then instead there wouldn't be any problems. After all, v3.5 does improve on its fine-tuned base, and who wouldn't be happy to see a better model?

(Translated with DeepL)

oioioicolaJun 15, 2025

I'm an outsider, so I've decided to stop interfering. I apologize for deleting the original comment.

NeuroSenkoJun 15, 2025· 19 reactions

CivitAI

It cannot possibly be a fine-tune of Rouwei 0.8 vpred for one simple reason:

Rouwei v0.8.0 (epsilon):

"createdAt": "2025-05-25T17:56:29.844Z"

Rouwei v0.8.0 (v-pred):

"createdAt": "2025-06-08T23:32:42.554Z"

IllumiYume XL v3.5 (v-pred):

"createdAt": "2025-06-10T07:32:34.084Z"

It's simply not feasible to produce a fine-tune just one day after the base model's release. You'd need significantly more time just to study the nuances of the original model.

And let's not forget: Rouwei 0.8 epsilon itself was released just two weeks before IllumiYume XL v3.5 v-pred. That's barely enough time to analyze the base, develop a training setup, adapt this custom distillation mechanisms to this specific checkpoint, debug it, run a full training cycle, and validate the output, which makes the claim of a personal fine-tune even more dubious.

What we're looking at here is clearly a merge of Rouwei 0.8 with some quality-enhancing LoRAs, may be with weights from other checkpoints, what caused loss of full vpred range from original. May be small finetuning that can be done in short time between the release of models. And there's absolutely nothing wrong with that - after all, the majority of models on Civitai are exactly that. Just like many users preferred AutismMix over using PonyDiffusion v6, even though most of the original work was done by Astralite in his Pony model.

The real issue lies elsewhere: duongve13112002 deliberately misled the community by claiming this was his own personal fine-tune. Even after being caught, he tried to deflect, dodging direct questions and hiding behind technical jargon to appear more credible.

If this really were a distilled model, then I'd love to hear a credible explanation for how watermark patterns - previously unknown and never publicly disclosed - ended up in his version.

And to top it all off, he has the audacity to keep asking for money for future "training" work.

The sheer nerve is astounding!

When people like this start asking for donations, it becomes clear: this isn't just dishonesty - it's fraud. Had he been upfront with his audience, there wouldn’t have been any controversy.

nuko_masshiguraJun 19, 2025· 11 reactions

CivitAI

There are merge recipes for the IllumiYume XL series.

v3.2 v-pred

https://civitai.com/images/83184782

v3.1 v-pred

https://civitai.com/images/83186295

v3.1 e-pred

https://civitai.com/images/83187293

v3.0

https://civitai.com/images/83187844

v2.0

https://civitai.com/images/83188267

v1.0 metadata

https://files.catbox.moe/j7qp2g.json

You can view them by dropping the model file into ComfyUI with comfy-mecha installed.

In v3.5 there was no metadata at all. It may have been lost in the process of adding the v_pred and ztsnr keys.

The v1.0 metadata does not contain anything about comfy-mecha, but does contain information about other merge tools, such as "webui", "sd-webui-supermerger", "sd-webui-model-mixer" and "merge-models-chattiori".

Among the merged models, there may be a model that you trained yourself, but I would publish these models as "CHECKPOINT MERGE" rather than "CHECKPOINT TRAINED".

bluvollJun 22, 2025· 1 reaction

I can confirm that adding the keys for prediction autoswap for comfy and reforge, deletes ComfyUi metadata.

Shio_NJun 29, 2025· 1 reaction

You can't tell. Recipes contain a BIG part of unknown models. It can be trained after the actual merge. It looks like v.3.2 contains "tuned" versions of rouwei. "Tuned" may mean he did a training over rouwei model and used resulting model as part of his merge, so it will contain some knowledge from rouwei. It feels like he did a lot of work before each merge. The only his mistake - he doesn't included extra models he used in description.

Shio_NJun 29, 2025· 6 reactions

CivitAI

It looks like author did training over different models (including unknown version of rouwei). And then merged different models he trained and some extra models. Names of merged models in metadata indicate training of rouwei model (2 models in 2 different directions).

If model can generate content of watermark-tag and have different style - it's most likely training. His v3.2 model contains less than 55% power of "tuned" rouwei models. If they were merges there is a very small chance watermark can actually survive, so most likely they are real trained models, but over rouwei. Not Illustrious 1.0.

Checkpoint

Illustrious

by duongve13112002

Download (Beta) View on CivitAI