I. Introduction
NetaYume Lumina is a text-to-image model fine-tuned from Neta Lumina, a high-quality anime-style image generation model developed by Neta.art Lab. It builds upon Lumina-Image-2.0, an open-source base model released by the Alpha-VLLM team at Shanghai AI Laboratory.
Key Features:
High-Quality Anime Generation: Generates detailed anime-style images with sharp outlines, vibrant colors, and smooth shading.
Improved Character Understanding: Better captures characters, especially those from the Danbooru dataset, resulting in more coherent and accurate character representations.
Enhanced Fine Details: Accurately generates accessories, clothing textures, hairstyles, and background elements with greater clarity.
II. Information
For version 1.0:
This model was fine-tuned from the NetaLumina model, version
neta-lumina-beta-0624-raw, using a custom dataset consisting of approximately 10 million images. Training was conducted over a period of 3 weeks on 8× NVIDIA B200 GPUs.
For version 2.0:
This version has 2 versions:
Version 2.0:
I switched the base model to Neta Lumina v1 and trained this model on my custom dataset, which consists of images sourced from both e621 and Danbooru. The dataset is annotated with a mix of languages: 30% of the images are labeled in Japanese, 30% in Chinese (50% using Danbooru-style tags and 50% in natural language), and the remaining 40% in natural English descriptions.
For annotations, I used ChatGPT along with other models capable of prompt refinement to improve tag quality. Additionally, instead of training at a fixed resolution of 1024, I modified the code to support multiscale training, dynamically resizing images between 768 and 1536 during training.
Notes: Currently, I've only evaluated this model using benchmark tests, so its full capabilities are still uncertain. However, based on my initial testing, the model performs quite well when generating images at a resolution of 1312x2048 (as shown in the sample images I provided).
Moreover, this version the model generates images with the size up to 2048x2048 based on my testing.
Version 2.0 plus:
This model is fine-tuned from version 2.0, which had been trained on a dataset of higher-quality images. In this dataset, each image is annotated with both natural language descriptions and Danbooru-style tags.
The training procedure follows the same overall design as version 2, but is divided into three stages.
In the first two stages, the top 10 layers are frozen, and training is performed separately on the Danbooru-labeled subset and the natural language-labeled subset.
In the final stage, all layers are unfrozen and optimized jointly on the full dataset, which incorporates both Danbooru and natural language annotations.
This version reduces the issue of generated images exhibiting an artificial or 'AI-like' appearance, while also improving spatial understanding. For instance, the model is able to generate images in which a character is positioned on the left or right side of the images according to the prompt (as illustrated in the example). In addition, it provides modest improvements in rendering artist-specific styles.
You can find gguf quantization at here: https://huggingface.co/Immac/NetaYume-Lumina-Image-2.0-GGUF
Version 3.0:
This version introduces new character knowledge and also improves some existing characters that could not previously be generated (I will provide a list of the improved characters later). However, please note that not all characters in the list may be generated, since I aim to preserve the old knowledge while also enhancing aspects like text rendering, anatomy (when using artist styles, the model may sometimes produce inaccurate or imperfect anatomy), model stability, and some additional secret improvements.
For generating text within the images, I recommend using this system prompt: "You are an image generation assistant if the prompt includes quoted or labeled on image text render it verbatim preserving spelling punctuation and case. <Prompt Start>", it may help you achieve better results.
Here is a link to a gallery of example images generated in an artistic style using this version: Artist Style Gallery. Thank @LyloGummy for contributing.
For version 3.5 (pre-trained model):
This version is a pre-trained model (I’m not sure what to call it, but it’s basically a continuation of the previous work by the Neta team, using the Neta Lumina v1.0 model). To clarify further, versions 2.0 Plus and 3.0 were fine-tuned from this pre-trained model. My workflow involves using the best checkpoint from this pre-trained model at that time and fine-tuning it.
In this version, I also updated my dataset (only the Danbooru dataset, up to date at 12:00 a.m. on September 3). The new dataset only contains tags, since I don’t have anyone to help me validate natural prompts.
Basically, I didn’t change the dataset too much I just updated it with the latest data, using a part of dataset from neta team and merged it with the previous one. So, the model still generates images that look quite similar. However, if you use the correct trigger prompts, the outputs will differ. The good news is that it still retains all of its previous knowledge accurately (some antistyle has been improved).
In addition, the default style of model currently is stable, the anatomy and text generation seems better than previous.
Lastly, this model is different from the test version I released on Hugging Face.
Here is the diffusers format for this version: duongve/NetaYume-Lumina-Image-2.0-Diffusers-v35-pretrained · Hugging Face
For version 4.0:
In this version, I changed the way I annotate the dataset. Instead of using only tags and natural language, I now use both unstructured and structured annotations for each image. In addition to tags and natural-language descriptions, I added JSON and XML formats. For the tag, JSON, and XML formats (in natural and tag format), I also shuffle the annotations. For example, in the XML format similar to JSON when formatted as tags:
<tags>
<characters>kubo nagisa</characters>
<general>long hair, purple hair, purple eyes</general>
</tags>During preprocessing for each epoch, when this XML annotation is encountered, I randomly drop individual tags such as “purple hair” or other character-related attributes with some probability. I also shuffle the fields, so for example, the
<general>field may appear before the<characters>field.In this version, I also updated my dataset. It now includes the Danbooru dataset up to October 10, 2025. However, ten days ago, I also made an additional update by adding a small dataset during the period when I had paused the training process.
In this version, I reduced AI artifacts and improved the character anatomy. It’s still not perfect, but when you use natural language in the prompt combined with a suitable negative prompt, the results are noticeably better.
Note: All previous knowledge is still retained, you just need to use the correct trigger tags or prompts. Additionally, the current default style is set to anime for greater stability.
III. Model Components:
Text Encoder: Pretrained Gemma-2-2B
VAE: From Flux.1 dev's VAE
Image Backbone: Fine-tuned version of NetaLumina's backbone
IV. File Information
This all-in-one file includes weights for VAE, text encoder, and image backbone. Fully compatible with ComfyUI and other systems supporting custom pipelines.
If you only want to download the image backbone, feel free to visit my Hugging Face page, it includes the separated files along with the
.pthfiles in case you want to use them for fine-tuning.
V. Suggestion Settings
For more details and to achieve better results, please refer to the Neta Lumina Prompt Book.
VI. Notes & Feedback
This is an early experimental fine-tuned release, and I’m actively working on improving it in future versions.
Your feedback, suggestions, and creative prompt ideas are always welcome — every contribution helps make this model even better!
VII. How to Run the Model on Another Platform
You can use it through the tensor.art platform. Here is the model link: https://tensor.art/models/898410886899707191
However, to run the model in an optimized way, I recommend using Comfyflow from tensor.art (because its default runner lacks configuration, which makes the model run suboptimally). Here is an example flow you can use on the platform: https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0/blob/main/Lumina_image_v2_tensorart_workflow.json
VIII. Acknowledgments
Big thanks to narugo1992 for the dataset contributions.
Credit to Alpha-VLLM and Neta.art Lab for the fantastic base model architecture.
If you'd like to support my work, you can do so through Ko-fi!
Description
FAQ
Comments (53)
您真是一个好人,比那群说你骗子,但是自己却不动手微调只会嘴上功夫的人好太多了,我敢肯定您的模型早已超越了原本的neta,未来当这个社区火热起来后,你也就和第一代的NOOBAI和光环模型一样,成为第一下载量的模型!3.0我还没测试但是我看我朋友试了说构图无比的好,当然的其实我想在nsfw方面替代光辉,毕竟光辉都是tag,有些效果是自然语言无法比拟的(哈哈)
我也不太在意别人怎么说,我做模型主要是为了满足自己的需求,就是要有一个尽可能稳定、最容易使用的模型。
@duongve13112002 哈哈,那群人嘴巴太毒了,我问它们有什么项目,它们说自己在训练最简单易上手的lora........也没有微调模型这种大项目,我感觉tag混合自然语言才是方向,纯粹自然语言打标上可能要无比精准,我感觉就跟wan2.2的双高低噪模型一样,tag用来固定确定方向,自然语言用来补充细节,这样的模型才更方便,然后我用你的模型,就感觉这种方式十分的好用,弥补了tag不能详细描述,又弥补了自然语言在训练初期吗,需要打标的过于精准才能描述出特定的东西的劣势
c站偶遇强力训练师,效率高超强如怪物,拼尽全力也无法战胜
Hello, if you don’t mind, I’d like to hear your feedback on this v3 model. In this version I’ve made some changes, so I’m not sure whether it fits your needs or not. Please note that v3 is still not very stable for me (more stable than v2 plus, but still not up to my expectations). If you find it unstable during testing, just like or leave a comment here and I’ll try a different approach to tune it for better stability. Feel free to give both positive and negative feedback in as much detail as possible. Thank you.
There are definitely a few artist styles and characters that I wish were supported but are still not, but for the most part from my early testing, it definitely seems to be an improvement over 2.0, great work! :)
Overall I think it's a bit better in anatomy and colors. The new characters are still a bit underbaked. Are you using the same method in character tagging the original neta guys did for example they recommend putting a # in front of characters in the prompting guide like #sameko_saba?
@hiben40387 I’m using another way to tag because, during my experiments, I noticed that marking character names with '#' doesn’t improve knowledge much. Moreover, it can reduce the model’s creativity when dealing with complex cases. As for the underbaked knowledge, I could make it learn better, but the model might forget many things like artists, so it’s not a good choice for me.
如果能训练更多概念就好了,尤其是一些比较冷门的概念。不过考虑到大数据集全概念训练的成本,也就不强求了。
I have been using AI for stuff since Anything 3.0 back in the SD 1.5 days. I have went through hundreds of models and personal mixes combining those models (SDXL, Animagine, Pony, Illustrious, NoobAI). I have never heard of or used Lumina until now, and usually use a custom NoobAI/Illustrious mix that I have been working on for a while. I occasionally use Chroma for anime, but will not be anymore seeing as this model is better at everything in comparison.
Out of the tools I have used, in terms of raw prompt comprehension, this is the best of any anime model. Using either Danbooru tags or using full descriptions, this one is best at both. It very rarely messes up: it usually does not "mix" traits between characters or place them in the wrong positions, and follows details very closely. The quality of the image is also very high, though not quite as high as a NoobAI mixes can be. It avoids the "AI look" better, though. Besides that, the amount of artists it knows is less, but it does still know hundreds and imitates them well.
Currently though, it has a weaker database than NoobAI, especially for NSFW concepts (Any sort of "Insertion" usually does not come out right). What it does know, it is very good at, but it does not have knowledge at a NoobAI level (which is probably the current trade-off for the prompt comprehension). This means that even though the prompt comprehension can be excellent, it sometimes fails and falls apart due to lack of knowledge. Hands also need some work. This model does not handle dark scenes as well as the V-Pred models I'm used to.
So, generally, this is one of the best anime models I have ever seen, maybe even the best. It does have some things that currently poor and/or done better in other models, but it should be expected that there will be some tradeoffs. I will definitely be using it, and only going back to NoobAI when this model lacks the knowledge. Thank you for the great model. Good job!
If possible, please I would like to see the following artists:
ie_(raarami)
tanishi_(tani4)
etsuzan_jakusui
cbb_(tuucoo)
as of now with 3.0, i find it nice comparing it to 2.0, after training some styles and such i found a somewhat sweet spot, that and well it doesn't look as noisy as 2.0 plus, i'm sure it can improve but, for what it is now, it's an upgrade
It would be perfect if the model's knowledge of tags and nsfw could be further improved. Natural language is only better than tags when describing complex compositional positional relationships and distinguishing characters; otherwise, spending a lot of language to describe image content is much more troublesome than using tags, both in training and image generation.
@sheng327 Since I’m using an LLM as the text encoder, the model is quite strong with complex prompts. However, for tags, I’ve also tried improving it by training a LoRA for Gemma2. Still, because the data is limited, it doesn’t really show significant improvement compared to not using LoRA. I’m still considering what the most suitable solution would be. As for NSFW content, I’m not very keen on it, so I’ve kept it fairly low around 30% NSFW, with the rest covering many other things in my dataset.
@ManILoveAi I’ve checked the artist styles you mentioned, and they’re already included in my dataset. The number is probably a bit small, I kept it at around 150 images. Regarding artist style knowledge, I also admit that the model may not learn some of them well because currently I don’t want to train Lumina’s text encoder (it might be able to understand those concepts, but it could break a lot of other things). As for NSFW, it makes up about 35% of my dataset, so it might not be ideal.
It seems that version v3 has regressed in terms of stability and knowledge compared to v2plus. Hope it can keep improving.
I disagree TBH, I don't think it's a major difference but small details seem to be consistently better in most gens in subtle ways.
Hi there, glad to test v3, hope you can add more characters from Nikke or zenless zone zero, to the dataset, also some artist styles like Derpixon or d-art to have more illustratios styles,btw I notice that struggle with some concept like mouthless , to make some horror characters like the nurse from silent hill for example
Hi i checked the dataset i trained for v3 it also has many characters from Nikke and zenless zone zero series
@duongve13112002 hmm maybe I prompted the wrong way because I could not get them, will try again,thanks 👍🏻
@duongve13112002 add model in https://www.seaart.ai/ pls
Hi i checked and want to upload the model but seart seems not support Lumina Image v2 model base
@duongve13112002 and it's a shame I really wanted to test it :(
@2ef6t7j7nz You can test by using tensor.art here is the link of model: https://tensor.art/models/898410886899707191.
非常好的模型,使我的comfyui旋转😎👍🏻
很好的模型,相见恨晚,我觉得面部细节方面是比ill系稳定的,不过六指问题还是有点频繁
Version 3.0 is pretty good! A slight improvement over 2.0 Plus IMO.
Trained a enhancement lora for v3. Should have better background, hands, less noise. maybe not... still testing...
Oh thank you so much :D
DPM++ 2S Ancestral Linear Quadratic @ CFG 4.5 seems to perform better with this than anything else I've tried so far, if it helps anyone. Also it seems like keeping in the Gemma instructions at the start of your positive and negative prompt actually does make a difference.
Thanks a lot for fine-tuning the Neta Lumina model. I tried fine-tuning too, but my model ended up worse than before. Moreover, I don’t understand why people are calling you a scammer in the NoobAi group, even though you just shared and uploaded the model without asking users for supporting money. In addition, models like Illustrious and Pony also used existing architectures and released their first versions as open source., then went closed-source to earn money in next version. Some people in that group just talk without really trying training a model like DIT is not easy; I’ve failed many times myself. Sorry if this comes off harsh.
Hi, I don’t really care about what they say about me I just want to make a model that’s stable and easy to use.
Hi, my friends and some users asked me about the issue where they cannot train LoRAs on my model, or even on Neta Lumina, when using the sd-scripts from the kohya-ss repo. The LoRAs don’t seem to learn anything, or they perform much worse compared to not using LoRAs at all.
I reviewed the code in this repository and found that the problem is related to the timestep. The implementation for Lumina contributed there is not correct for my model or for Neta Lumina. Moreover, at the moment, only the timestep type "nextdit_shift" works, others do not (I tested with training for 500 steps), but the type is not correct so the quality very bad.
I have a solution for this problem, but I need more time since I’m very busy fixing bugs and improving the next version of this model. :D
I just told my friends: just copy paste the official example, don't change or add anything, everything will be fine.
To me it seems only need to change 2 lines of code. Reverse the t in "nextdit_shift" and reverse the t in model input. Not sure though.. and not tested. I only used "nextdit_shift".
I'm guessing you've seen this issue too https://github.com/kohya-ss/sd-scripts/issues/2201 Sadly there are too many new models waiting to be supported, and lumina 2 is not popular, so no reply from devs...
@reakaakasky I will do this if i have free time so dont worry
@duongve13112002 🙏
Just curious, is v3 a pretrained model? Or has been finetuned on a small set of data? The default anime style looks so stable.
@reakaakasky V3 is a fine-tuned model based on my pretrained model, which was trained on a high-quality dataset from various sources (however i tuned it with short time). As for the pretrained model, I am continuously improving it.
@duongve13112002 Thanks for the info.
Will you release the pretrained model?
Like https://civitai.com/models/1188071/animagine-xl-40
name the pretrained model v3-zero or v3-pretrained or something.
That would be really great for wider finetuning and LoRA training. 🙏
@reakaakasky i think v4 is a pretrained model without tunning. You can use it to train lora
rubbing_hands.gif
@reakaakasky I just submitted a pull request to Kohy's sd-scripts to fix this issue and add a new time-step to make training Lumina more stable.
Hi! For more information about next version: I updated my dataset from Danbooru to 03/09/2025 (I removed datasets with resolution < 768 and lower quality). Moreover, I also added datasets from the Neta team in this version.
This is a pretrained model (not fine-tuned), so you can train LoRAs more easily. The anatomy seems to have improved better than I expected. However, I’ll pause updates for a few months after this version because I’m going on vacation and need to save money :D
Enjoy your vacation duongve :3
>pretrained
do you mean from scratch?
@Korewaai No it likes i continue training from neta Lumina checkpoint (but not really) :v
interested to see progress of this. seems to have lots of potential. will check it out once that new version comes out. Already looks to be quite high quality.
forge does not recognize model
I only got it to work on comfyUI sadly, even the gguf version of NetaYume v2.0 wasn't recognized.
Sadly I have no idea how to sue comfyUi properly to make it split the model into layers to share with my CPU. So I run into OOM errors. and putting --cpu in the command-args make it so it only uses the cpu and never the GPU.
--lowvram wasn't enough either. I wonder how other people with around 8GB of VRAM are making comfyUI work with Flux
@TheFairyMan i think comfyui by default automatically offload the text encoder to cpu, and 8gb vram is way enough for a 5gb diffusion model.
Are you using the "all-in-one" file?
uploaded a fp8 scaled version of v3, and gemma2
Needs to be compatible with forge, reforge and other forks alike.
Hi, this is the final version, and this is different from the test checkpoint I published it on huggingface.



















