CivArchive
    NetaYume Lumina (Neta Lumina/Lumina Image 2.0) - v3.5 (pretrained)
    NSFW
    Preview 105196035
    Preview 105196038
    Preview 105196033
    Preview 105196037
    Preview 105196042
    Preview 105196043
    Preview 105196034
    Preview 105196048
    Preview 105196032
    Preview 105196036
    Preview 105196041
    Preview 105196039
    Preview 105196047
    Preview 105196050
    Preview 105196046
    Preview 105200666
    Preview 105196193
    Preview 105196195
    Preview 105196192
    Preview 105200664

    I. Introduction

    NetaYume Lumina is a text-to-image model fine-tuned from Neta Lumina, a high-quality anime-style image generation model developed by Neta.art Lab. It builds upon Lumina-Image-2.0, an open-source base model released by the Alpha-VLLM team at Shanghai AI Laboratory.

    Key Features:

    • High-Quality Anime Generation: Generates detailed anime-style images with sharp outlines, vibrant colors, and smooth shading.

    • Improved Character Understanding: Better captures characters, especially those from the Danbooru dataset, resulting in more coherent and accurate character representations.

    • Enhanced Fine Details: Accurately generates accessories, clothing textures, hairstyles, and background elements with greater clarity.

    II. Information

    For version 1.0:

    • This model was fine-tuned from the NetaLumina model, version neta-lumina-beta-0624-raw, using a custom dataset consisting of approximately 10 million images. Training was conducted over a period of 3 weeks on 8× NVIDIA B200 GPUs.

    For version 2.0:

    This version has 2 versions:

    Version 2.0:

    • I switched the base model to Neta Lumina v1 and trained this model on my custom dataset, which consists of images sourced from both e621 and Danbooru. The dataset is annotated with a mix of languages: 30% of the images are labeled in Japanese, 30% in Chinese (50% using Danbooru-style tags and 50% in natural language), and the remaining 40% in natural English descriptions.

    • For annotations, I used ChatGPT along with other models capable of prompt refinement to improve tag quality. Additionally, instead of training at a fixed resolution of 1024, I modified the code to support multiscale training, dynamically resizing images between 768 and 1536 during training.

    • Notes: Currently, I've only evaluated this model using benchmark tests, so its full capabilities are still uncertain. However, based on my initial testing, the model performs quite well when generating images at a resolution of 1312x2048 (as shown in the sample images I provided).

    • Moreover, this version the model generates images with the size up to 2048x2048 based on my testing.

    Version 2.0 plus:

    • This model is fine-tuned from version 2.0, which had been trained on a dataset of higher-quality images. In this dataset, each image is annotated with both natural language descriptions and Danbooru-style tags.

    • The training procedure follows the same overall design as version 2, but is divided into three stages.

      • In the first two stages, the top 10 layers are frozen, and training is performed separately on the Danbooru-labeled subset and the natural language-labeled subset.

      • In the final stage, all layers are unfrozen and optimized jointly on the full dataset, which incorporates both Danbooru and natural language annotations.

    • This version reduces the issue of generated images exhibiting an artificial or 'AI-like' appearance, while also improving spatial understanding. For instance, the model is able to generate images in which a character is positioned on the left or right side of the images according to the prompt (as illustrated in the example). In addition, it provides modest improvements in rendering artist-specific styles.

    • You can find gguf quantization at here: https://huggingface.co/Immac/NetaYume-Lumina-Image-2.0-GGUF

    Version 3.0:

    • This version introduces new character knowledge and also improves some existing characters that could not previously be generated (I will provide a list of the improved characters later). However, please note that not all characters in the list may be generated, since I aim to preserve the old knowledge while also enhancing aspects like text rendering, anatomy (when using artist styles, the model may sometimes produce inaccurate or imperfect anatomy), model stability, and some additional secret improvements.

    • For generating text within the images, I recommend using this system prompt: "You are an image generation assistant if the prompt includes quoted or labeled on image text render it verbatim preserving spelling punctuation and case. <Prompt Start>", it may help you achieve better results.

    • Here is a link to a gallery of example images generated in an artistic style using this version: Artist Style Gallery. Thank @LyloGummy for contributing.

    For version 3.5 (pre-trained model):

    • This version is a pre-trained model (I’m not sure what to call it, but it’s basically a continuation of the previous work by the Neta team, using the Neta Lumina v1.0 model). To clarify further, versions 2.0 Plus and 3.0 were fine-tuned from this pre-trained model. My workflow involves using the best checkpoint from this pre-trained model at that time and fine-tuning it.

    • In this version, I also updated my dataset (only the Danbooru dataset, up to date at 12:00 a.m. on September 3). The new dataset only contains tags, since I don’t have anyone to help me validate natural prompts.

    • Basically, I didn’t change the dataset too much I just updated it with the latest data, using a part of dataset from neta team and merged it with the previous one. So, the model still generates images that look quite similar. However, if you use the correct trigger prompts, the outputs will differ. The good news is that it still retains all of its previous knowledge accurately (some antistyle has been improved).

    • In addition, the default style of model currently is stable, the anatomy and text generation seems better than previous.

    • Lastly, this model is different from the test version I released on Hugging Face.

    • Here is the diffusers format for this version: duongve/NetaYume-Lumina-Image-2.0-Diffusers-v35-pretrained · Hugging Face

    For version 4.0:

    • In this version, I changed the way I annotate the dataset. Instead of using only tags and natural language, I now use both unstructured and structured annotations for each image. In addition to tags and natural-language descriptions, I added JSON and XML formats. For the tag, JSON, and XML formats (in natural and tag format), I also shuffle the annotations. For example, in the XML format similar to JSON when formatted as tags:

    <tags>
        <characters>kubo nagisa</characters>
        <general>long hair, purple hair, purple eyes</general>
    </tags>
    • During preprocessing for each epoch, when this XML annotation is encountered, I randomly drop individual tags such as “purple hair” or other character-related attributes with some probability. I also shuffle the fields, so for example, the <general> field may appear before the <characters> field.

    • In this version, I also updated my dataset. It now includes the Danbooru dataset up to October 10, 2025. However, ten days ago, I also made an additional update by adding a small dataset during the period when I had paused the training process.

    • In this version, I reduced AI artifacts and improved the character anatomy. It’s still not perfect, but when you use natural language in the prompt combined with a suitable negative prompt, the results are noticeably better.

    • Note: All previous knowledge is still retained, you just need to use the correct trigger tags or prompts. Additionally, the current default style is set to anime for greater stability.

    III. Model Components:

    • Text Encoder: Pretrained Gemma-2-2B

    • VAE: From Flux.1 dev's VAE

    • Image Backbone: Fine-tuned version of NetaLumina's backbone

    IV. File Information

    • This all-in-one file includes weights for VAE, text encoder, and image backbone. Fully compatible with ComfyUI and other systems supporting custom pipelines.

    • If you only want to download the image backbone, feel free to visit my Hugging Face page, it includes the separated files along with the .pth files in case you want to use them for fine-tuning.

    V. Suggestion Settings

    For more details and to achieve better results, please refer to the Neta Lumina Prompt Book.

    VI. Notes & Feedback

    This is an early experimental fine-tuned release, and I’m actively working on improving it in future versions.
    Your feedback, suggestions, and creative prompt ideas are always welcome — every contribution helps make this model even better!

    VII. How to Run the Model on Another Platform

    You can use it through the tensor.art platform. Here is the model link: https://tensor.art/models/898410886899707191

    However, to run the model in an optimized way, I recommend using Comfyflow from tensor.art (because its default runner lacks configuration, which makes the model run suboptimally). Here is an example flow you can use on the platform: https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0/blob/main/Lumina_image_v2_tensorart_workflow.json

    VIII. Acknowledgments

    If you'd like to support my work, you can do so through Ko-fi!

    Description

    FAQ

    Comments (164)

    ZootAllures9111Oct 10, 2025· 1 reaction
    CivitAI

    I'm still not sure I really understand what 3.5 is at all, in the sense of like, is it less trained than your previous versions were in direct comparison to Neta Lumina 1.0, or not? And is it specifically supposed to perform better than 2.0 Plus / 3, or not?

    reakaakaskyOct 11, 2025· 1 reaction

    Depends how you define "better".

    Pretrained model just means there is no finetuning at the end of the training.

    finetuning means training on a small highest quality dataset to stabilize styles, at the cost of losing creativity (if something not in the final dataset, the model may forget it).

    Pretrained model has better creativity, but harder to use, usually needs long prompt.

    reakaakaskyOct 11, 2025· 1 reaction

    e.g. notice that finetuned versions do look better but there are style shifting towards a same universal style

    https://civitai.com/images/105200664

    ZootAllures9111Oct 11, 2025

    @reakaakasky Interesting. I've been doing some testing at at least for the prompts I use (which don't really tend to use specific artist tags that much) I still feel like 3.5 is a bit more coherent than 3.0, with better details / proportions generally. So I'll continue to use it I guess.

    duongve13112002
    Author
    Oct 10, 2025· 3 reactions
    CivitAI

    Hi! For anyone who hasn’t read the description of this model: to clarify, versions v2 plus and v3 are fine-tuned variants based on the best-performing checkpoint of the model (v3.5 pretrained) at that time. This version can be considered the pretrained or 'raw' model, by I continued train NetaLumina v1.0 and create this model. Moreover, I’ve added a new dataset that includes tags caption only in this version, so you’ll notice significant improvements when you use the correct trigger prompts as shown in my gallery.

    IRedDragonICYOct 11, 2025· 4 reactions
    CivitAI

    naiceeee keep it up👍

    ShinwohOct 11, 2025· 3 reactions
    CivitAI

    3.5 looks amazing, but I feel like the example images don't showcase the models real potential

    I think this is the best model when it comes to painterly style

    yamatazenOct 12, 2025· 1 reaction
    CivitAI

    Should I use the finetuned or pretrained version?

    duongve13112002
    Author
    Oct 12, 2025· 2 reactions

    Hi, i think at the moment pre-trained is better because fine-tune version may loose knownledge. Moreover, the quality between pre-trained and fine-tune is not too much different now.

    yamatazenOct 12, 2025· 2 reactions
    CivitAI

    What is the knowledge cutoff?

    duongve13112002
    Author
    Oct 12, 2025· 1 reaction

    I used the Danbooru dataset, up to date at 12:00 a.m. on September 3, 2025 and some part of dataset provided from neta team

    yamatazenOct 12, 2025· 1 reaction

    @duongve13112002 For v3.0 or v3.5?

    duongve13112002
    Author
    Oct 12, 2025· 3 reactions

    @yamatazen it is for v3.5

    ShinwohOct 13, 2025
    CivitAI

    Patreon watermarks won't stop appearing when using western artists artstyle (cutesexyrobutts, shexyo) also I wish the model knew Komano Manato and Floox artstyle 😩

    duongve13112002
    Author
    Oct 13, 2025

    Hi, about the problem related to the Patreon watermarks, have you tried adding ‘patreon logo, patreon username’ to the negative prompt yet (this problem related to the dataset on danbooru most of them contains this watermark) ?
    As for Floox’s art style, I checked on Danbooru and found only about 15 images available, so I think that’s not enough data for the model to really learn or understand the style.

    ShinwohOct 13, 2025

    @duongve13112002 yes I tried those prompts, even more but the watermark keeps appearing

    ShinwohOct 13, 2025
    CivitAI

    Have you seen the new Lumina-DiMOO model?

    duongve13112002
    Author
    Oct 13, 2025· 2 reactions

    Yes, I know, but this base model is too large. If I were to fine-tune it, it would take a lot of time. Moreover, currently, users' resources can run this model smoothly.

    reakaakaskyOct 14, 2025· 1 reaction

    @duongve13112002 I think model with editing, or ability of understanding references, has so much potential.

    In theory, we don't need a full finetune. We simply need to teach the model to better understand different anime styles and characters and how to "edit" them. This way, we only have to give model the character and style reference images, the model can generate any combination. I think that would be a "small" finetune, maybe only need 10k images. but the problem is, how to prepare the dataset with image pairs...

    duongve13112002
    Author
    Oct 14, 2025· 1 reaction

    @reakaakasky i will try later :D

    Ada321Oct 15, 2025

    @reakaakasky https://thinkingmachines.ai/blog/lora/

    lora is all you need

    dweller_factor172648Oct 13, 2025· 8 reactions
    CivitAI

    It's really hard to sum up this model in just a few words.

    The fact that you can use both natural language and Danbooru tags together is awesome.

    On the other hand, it's super difficult to control the art style and quality unless you use a LoRA that's compatible with Lumina.

    But I still want to give this model high praise. That's because I think when new models like this—ones that aren't Illustrious or Pony—get some recognition, it really helps liven up the local AI art scene.

    duongve13112002
    Author
    Oct 14, 2025· 2 reactions

    Hi! Currently, I’m the only one working on this architecture.
    If the model can be adopted more widely by the community, I believe it could become as popular as Illustrious or Pony.
    I know the progress will be slow since I need both time and money to continue the training and testing process and I don’t have any support at the moment :v.

    dweller_factor172648Oct 14, 2025· 2 reactions

    @duongve13112002 Thanks for getting back to me.

    I really appreciate you developing this all by yourself.

    I don't mean to rush you, and I'm not looking for some super-polished final product, so please just take it easy.

    By the way, I don't have any model development skills or a super high-end GPU, but I just created a Ko-fi account.

    I'm rooting for you.

    nekoplexOct 14, 2025
    CivitAI

    I have an error based on workflow given by you and on lumina's workflow too( there I didn't get any errors but it says press any key to continue and then command line closes ). Model from hugginfaces with workflow given by you on huggingface : NetaYumev35_pretrained_unet.safetensors
    what should I do and can you give me any advices to make it work ( My setup : Windows 11, Gpu : gtx 1650 mobile, 8gb ram cpu : intel i5 10300H )
    ERROR: clip input is invalid: None If the clip is from a checkpoint loader node your checkpoint does not contain a valid clip or text encoder model.


    Original neta lumine v1 works fine for me without giving any errors

    duongve13112002
    Author
    Oct 14, 2025· 1 reaction

    Hi, this file which you downloaded contains only the image encoder (UNet). To run the model, you’ll also need to download the VAE and text encoder in my repo on huggingface. However, for easier use similar to XL you can download the all-in-one file instead.

    reakaakaskyOct 14, 2025· 1 reaction

    gtx 1650 mobile, 8gb ram

    you have to use the q8 model

    swapping layers on gpu and swapping memory on cpu in the same time sounds horrifying...

    nekoplexOct 15, 2025· 1 reaction

    @duongve13112002 thank you! Now it works. Sadly I didn't manage to run all-in-one model but it still creates better images than original Neta Lumina. You created good model which gives impressive results

    LivielOct 15, 2025· 1 reaction
    CivitAI

    I found that the best way to use this model is to pass the resulting images through an SDXL model to refine them further.

    It has a great understanding of language and tags and it will almost allways give you something close to what you want, but the style and finer details are generally worse than something like Illustrious.

    If you use this method you end up with the best of both worlds.

    duongve13112002
    Author
    Oct 15, 2025· 2 reactions

    Hi, the model still needs improvement. I’ll do my best to make it better, but I need some time to test and experiment since tuning the model is difficult unlike with XL architecture.

    ZootAllures9111Oct 16, 2025· 1 reaction

    @Liviel I personally disagree, the much worse VAE of SDXL is noticeable. I recommend just doing normal hi-res-fix style two-pass upscaling with denoising in ComfyUI.

    nissian202Oct 15, 2025
    CivitAI

    The original Neta Lumina has a resolution of 1024x1024. May I ask how you achieved 2048x2048?

    By simply adjusting the image sizes with code to 2048 and adding them to the database? And it started adjusting the rest of the data to this size?

    duongve13112002
    Author
    Oct 15, 2025

    Hi, not really. Currently, I only train by choosing a random size within the range of 768 to 1536 for each batch. I think this could be the reason why the model can generate images at 2048×2048.

    ZootAllures9111Oct 16, 2025

    The original Neta Lumina was higher than 1024x1024 TBG

    MasterArtOct 16, 2025
    CivitAI

    Good to see something new. I would like to try it in Forge.

    ZootAllures9111Oct 16, 2025· 1 reaction

    Tell one of the Forge variants maintainers to add Lumina 2.0 support then lol. Or use SD Next if you don't like Comfy, it already has Lumina support also.

    jianbijiajia872Oct 17, 2025· 1 reaction
    CivitAI

    A really nice model I can't praise it in only few words. But how to remove the censor?

    ZootAllures9111Oct 17, 2025

    It's not clear what you mean by "the censor"

    jianbijiajia872Oct 18, 2025

    @ZootAllures9111 When I use the prompt like "spreading pussy", it always generates a blurred image.

    duongve13112002
    Author
    Oct 18, 2025

    @jianbijiajia872 During the training process, I contains various the NSFW images, but this problem is related to the text encoder (Gemma2). I think the solution to handle this is to fine-tune it, but some issues may occur.

    reakaakaskyOct 18, 2025

    I;m guessing he means "pixelization" on xxx rated contents.

    It's almost impossible to remove such censor because those contents are not allowed on the normal Internet. So does the open dataset. There are tons of pixelized images.

    Unless someone want to do a some kind of "nsfw master" finetune with a clean dataset...

    reakaakaskyOct 18, 2025

    @duongve13112002 no, not the gemma, TE only needs to "distinguish" your text input. So DiT can generate different images later.

    Censorship on TE does not matter. Fun fact, in order to let LLM reject harmful content accurately, it needs to become a harmful content master (knows every kinds of content), to reject your query later.

    duongve13112002
    Author
    Oct 18, 2025

    @reakaakasky I am not sure because my dataset contains variousN S F W data from multiple sources. This makes up 40% of my dataset.

    reakaakaskyOct 18, 2025· 2 reactions

    @duongve13112002 😱 turns out it is already a ns fwmaster. No wander it's so good at those concepts.

    I mean sometimes there are censors on xxx content. Might because the dataset itself has lots of censors, e.g. >20%.

    reakaakaskyOct 18, 2025
    CivitAI

    Might be a nerd question. Did v3.5 trained with some kind of noise amplifier? Like "noise offset"?

    reakaakaskyOct 18, 2025· 1 reaction

    Follow up nerd question. if it did. what's the offset ratio.

    Noise schedule in LoRA needs to be aligned with the base model. otherwise it would be a big problem when stacking LoRAs. Cause every LoRA is gonna try to "cancel" the offset and modify the noise schedule, towards same direction.

    duongve13112002
    Author
    Oct 18, 2025

    @reakaakasky yub i used it but it is a dynamic value from 0-0.075

    reakaakaskyOct 18, 2025· 2 reactions

    oh..sh*t, that's a horrible news...for LoRA...

    reakaakaskyOct 18, 2025· 1 reaction

    Did precious versions also had noise offset? I kind of want to do some comparing tests.

    duongve13112002
    Author
    Oct 18, 2025

    @reakaakasky The previous i didnt use

    reakaakaskyOct 19, 2025· 5 reactions

    @duongve13112002 Thanks for the confirmation.

    Actually this question first came to my mind because I noticed when I stacking 2 v3.5 LoRAs (trained without offset), the image feels like washed out. v3 didn't not have such problem.

    I've seen this effects in SDXL, when the base model was trained with noise offset, but the LoRA wasn't. It becomes a problem when stacking more than one LoRAs.

    Illustrious LoRAs don't have such problem and you can stack tons of it's LoRAs because illustrious v0.1 did not use noise offset. So the LoRA won't modify the noise schedule by default. Even you apply it on a noise offset base model.

    But most of later finetunes used noise offset, so the compatibility of their LoRAs is noticably worse.

    reakaakaskyOct 19, 2025· 1 reaction

    bad news +1

    diffusers-pipe doesn't support noise offset. only sd-scripts does doesn't support.

    Did some searching. feel like it's an old trick only for old SD models. do we really need noise offset in newer model? 🤔

    duongve13112002
    Author
    Oct 19, 2025

    @reakaakasky Oh i chose this because i think it is a regulazation for model

    reakaakaskyOct 19, 2025

    update: I checked sd-scripts code, and there is no noise offset in Lumina 2.

    The argument is for old SD models.

    reakaakaskyOct 19, 2025

    @duongve13112002 

    "regulazation for model"

    I think that's the "input perturbing noise".🤔

    duongve13112002
    Author
    Oct 19, 2025

    @reakaakasky Basically, I want to prevent the model from overfitting. Moreover, I am training the model using my custom training script.

    JigenOct 19, 2025

    @duongve13112002 Are you talking about an actual noise offset or timestep shift?

    duongve13112002
    Author
    Oct 20, 2025

    @Jigen I’m talking about the noise offset. @reakaakasky I’ll try turning it off and on to see if there’s any difference in the next version.

    reakaakaskyOct 20, 2025

    When stacking LoRAs, not big difference, but noticeable enough for me to guess and bring up the question. Because I've tested and seen this on SDXL before.

    reakaakaskyOct 20, 2025· 1 reaction

    here are my personal thoughts from LoRA, I've never do huge finetune so those might be completely nonsense for the model.

    - i don't know if noise offset can make images better, because I've never used it. I do know it is a hard to notice problem for LoRA, if you can't align the noise schedule.

    - So far non of the training tool supports the notice offset. So no LoRA can align the noise schedule (unless added 2 lines code). Other question is, is the noise offset still useful for newer model.

    - iirc the model can learn and modify the noise offset very quickly (<1K steps), even when training a LoRA. It's just a noise factor. So maybe serval thousands steps on v3.5 without noise offset, the model will completely forget about it. (?)

    duongve13112002
    Author
    Oct 20, 2025

    @reakaakasky I agree with you to some extent. When I fine-tune using it, the model seems more stable compared to when I don’t use it (though I’m not sure why).

    reakaakaskyOct 20, 2025

    Found an example LoRA in SDXL repo to "enable" noise offset. Metadata said it was trained with 7k image and 7k steps.

    https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main

    idea: do this as well? Then we have both noise offset (maybe even better, it is adjustable) and a standard noise schedule in base model, so no more LoRA compatibility issues.

    reakaakaskyOct 20, 2025· 1 reaction

    "This is an example LoRA for SDXL 1.0 (Base) that adds Offset Noise to the model, trained by KaliYuga for StabilityAI. When applied, it will extend the image's contrast (range of brightness to darkness), which is particularly popular for producing very dark or nighttime images. At low percentages, it improves contrast and perceived image quality; at higher percentages it can be a powerful tool to produce perfect inky black. This small file (50 megabytes) demonstrates the power of LoRA on SDXL and produces a clear visual upgrade to the base model without needing to replace the 6.5 gigabyte full model. The LoRA was heavily trained with the keyword contrasts, which can be used alter the high-contrast effect of offset noise.

    Recommended strength: 50% (0.5). The keyword contrasts may alter the effect."

    duongve13112002
    Author
    Oct 20, 2025

    @reakaakasky ok 👌

    GusisoOct 20, 2025· 5 reactions
    CivitAI

    Great model, being able to have a choice or combination between natural language and danbooru tags is awesome, surprised no one has made this yet, given enough time and some marketing this could go mainstream.

    But i have a question, would SDXL based loras work? (Illustrious, Pony, Base SDXL, etc.)?

    duongve13112002
    Author
    Oct 21, 2025· 2 reactions

    Hi the loras trained on these base models can not be used on lumina image v2. Because they have different architecture

    NTR_BLACKOct 22, 2025· 1 reaction
    CivitAI

    I want to try to use your base model to train a character Lora. Are the tags used in the training compatible with the previous SDxL anime model? Or do I need to manually organize the tags again? Are there any good references? I have been using the noob vpred SDL model before.

    duongve13112002
    Author
    Oct 23, 2025· 1 reaction

    Hi, you can use tag captions to train the LoRA character for this model. However, I recommend training LoRAs using both tags and natural language captions for each image, for example, during each epoch, you can randomly choose between a tag caption and a natural language caption.

    AIZorOct 22, 2025
    CivitAI

    How can i use controlnet in comfyui with this model?

    duongve13112002
    Author
    Oct 23, 2025

    Hi, at the moment this architecture havent supported controlnet yet

    SansenskiyOct 23, 2025· 1 reaction
    CivitAI

    I decided to try training the LoRa model on your checkpoint, but I ran into the following problem. I'm using the OSTRIS AI-Toolkit. It's supposedly better tuned than the standard Kohh and learns well (using the stock Lumina-Image-2.0 model), but the style is, to put it mildly, far from the desired result on intermediate samples.
    I can't use your model directly for training because it requires a full folder like Alpha-VLLM\Lumina-Image-2.0 with all the attachments like VAE, etc. Slipping in boring VAEs and endors isn't a problem, but model_index.json is a problem; it clearly shouldn't be stock, just like the configs in other folders that match your checkpoint.
    Is there any way I can contact you to put together a similar folder and try teaching directly on your Chepoint, rather than on a bare Lumin?

    duongve13112002
    Author
    Oct 23, 2025

    Hi, you can use this repo to train lora (this is a diffusers format of my model): duongve/NetaYume-Lumina-Image-2.0 · Hugging Face

    SansenskiyOct 23, 2025

    @duongve13112002 That's not what I asked about. I don't need an All-in-One file or just separate VAEs, encoders, etc. They won't run without a JSON config, like model_index.json and other config files in that repository. Yours doesn't have those.

    SansenskiyOct 23, 2025

    @reakaakasky Thank you)

    duongve13112002
    Author
    Oct 23, 2025

    @reakaakasky oops my bad ...

    reakaakaskyOct 23, 2025

    iirc, last time I tried AI-Toolkit, comfyui does not support its lora, it's also in "diffusers" format. Need to be converted first.

    I end up using diffusion-pipe, supports both comfyui format safetensors in and out, so life is easier...

    SansenskiyOct 23, 2025

    @reakaakasky Comfi accepted my LoRa model after using AI-Toolkit, but the diffusion folder is something new for me.

    I'm used to the regular kohh script, but it can't train a LoRa model for Lumen (I don't know why, but I don't have any points that even hint at its support). I decided to test this new thing after NoobAI and Illustrious. They're good, but if I want to add a lot of detail (and I really do, and often :) ), then output is a mess/artifacts. I understand the problem is the SDXL-based models' 75-token limit, which I often overflowed and couldn't fit everything I wanted.

    SansenskiyOct 23, 2025

    @reakaakasky You were right... Comfi doesn't accept lore from the AI ​​Toolkit. I didn't think to disable it and see what would happen without the dummy lore. Can you tell me what this pipeline is?

    reakaakaskyOct 24, 2025

    "Can you tell me what this pipeline is?"
    You mean "diffusion-pipe"? just another training tool.

    ZootAllures9111Oct 24, 2025

    @Sansenskiy Lumina 2.0 support is in the "SD3" branch of Kohya. You'd also want to use this PR: https://github.com/kohya-ss/sd-scripts/pull/2225

    Shuffle_V2Oct 25, 2025· 2 reactions
    CivitAI

    I want to train a LoRA for this model, but I have no idea what to use. I've only done 3 SDXL LoRAs. So I know basically nothing, so what do I need?

    duongve13112002
    Author
    Oct 25, 2025

    Hi, there isn’t any difference between training LoRA on Lumina and XL. The main difference is that you should use a higher learning rate, and your training dataset should include both natural language and corresponding tags to make it easier for the model to learn.

    reakaakaskyOct 25, 2025

    12gb Nvidia gpu

    diffusion-pipe

    I use same training settings (lr, steps, batch size...) as SDXL

    Shuffle_V2Oct 25, 2025

    @duongve13112002  dose Kohya work?

    Shuffle_V2Oct 25, 2025

    @reakaakasky can you send your config with paths? I can't get it to work, and I think I'm doing paths wrong or something. I don't know.

    duongve13112002
    Author
    Oct 26, 2025

    @Shuffle_V2 Yes it work or you can use aitoolkit or simple tunner

    ikekph5Oct 26, 2025· 3 reactions
    CivitAI

    I was so glad that I found this model. it's ... insanely good. How can such a model be hidden away and still not popular???

    It should be pinned on the main homepage!

    silentzcOct 26, 2025
    CivitAI

    为什么出现六根手指的概率很高呢?

    duongve13112002
    Author
    Oct 26, 2025

    你好,这个问题我觉得一部分是由于 image encoder 的架构,另一部分是因为我还没有充分训练,所以这个问题可能会出现。我会尽力把它改善到最好。你可以尝试使用 negative prompt 来减少这个问题。

    reakaakaskyOct 27, 2025

    Needs a "hand" dataset. I'm on it.

    EternalCyclicOct 26, 2025· 6 reactions
    CivitAI

    It's good. Especially with the instructions here I was able to create an exact image I had in mind with all 5 fingers intact. You gotta be descriptive and it's quite good. Pony V7 didn't work out for me with styling. But this looks good so far. Also use https://gumgum10.github.io/gumgum.github.io/

    to get a style of an artist. I picked decent ones. Good so far.

    duongve13112002
    Author
    Oct 28, 2025· 6 reactions
    CivitAI

    Hi, at the moment forgeui supports lumina Image v2 base model. Here is the repo: Haoming02/sd-webui-forge-classic: The "classic" version of the Forge WebUI

    ReignShad0Oct 28, 2025
    CivitAI

    When using this all in one model (v3.5 pretrained) in comfyUI, I can see the preview of the image as it's generated but the resulting image is always completely black. Does anyone have any ideas on how to fix this? This happens even when using a tiled VAE decode, so I'm fairly certain it's not a memory issue (although likely related to the VAE in some way). Using an NVIDIA RTX 4070 with 12gb vram and 32gb system ram.

    The standard lumina model also worked for me, just not the all in one.

    munchkinOct 28, 2025· 2 reactions

    If you have Sage Attention enabled, that's the likely cause.

    ReignShad0Oct 28, 2025

    @munchkin Seems this was the case. Is sage attention just not compatible?

    munchkinOct 28, 2025

    @ReignShad0 Did you use --use-sage-attention? Right now I tested with a patch from KJNodes and it seems to work. I think Qwen Image also has an issue with this, but that patch didn't help.

    ReignShad0Oct 28, 2025

    @munchkin Yes I tried launching without sage attention and it worked just fine. It's just curious that Sage attention isn't compatible with this model in particular on my machine at the very least. I was able to run the default lumina just fine with sage attention.

    munchkinOct 29, 2025

    @ReignShad0 That's what I am saying. I had issue with --use-sage-attention specifically too. But now I tested it with the specific node "Patch Sage Attention KJ" and it worked.

    chimpedoutOct 28, 2025· 1 reaction
    CivitAI

    it's really quite impressive, way better prompt adherance than sdxl/illustrious/pony, but it's not quite there yet, feels just slightly off, be it contrast / finer details like jewelry / clothing / hands / eyes.

    BloxxxeyOct 28, 2025
    CivitAI

    Sorry, I might be stupid but how do I use that all in one model? You said it has the necessary weights for the VAE, text encoder, and image backbone. Do i just load this model in the load vae, load clip and load diffusion model nodes?

    BloxxxeyOct 28, 2025

    Oh, got it. It's the workflow from your hugging face profile and not from the example images here.

    chimpedoutOct 29, 2025· 1 reaction
    CivitAI

    Using the stabilizer and lowering the shift increases image quality like crazy, the model can actually compete with illustrious, I dare say I like some of the generations more.

    NomadSterbenOct 29, 2025
    CivitAI

    Mmm, seems like a really interesting model. Too bad it doesn’t work on Forge 😭

    duongve13112002
    Author
    Oct 30, 2025· 1 reaction

    Hi, there is a forgeui support supports it. Here is the link of this repo : Haoming02/sd-webui-forge-classic: The "classic" version of the Forge WebUI

    NomadSterbenOct 30, 2025

    @duongve13112002 Oh, I didn’t know that! Thank you so much — I’m really glad to try out your work!

    kou_123Nov 1, 2025

    @duongve13112002 How? I can't get it to work.

    duongve13112002
    Author
    Nov 1, 2025

    @kou_123 Hi according to this: https://github.com/Haoming02/sd-webui-forge-classic/pull/321 Currently forgeui havent support AIO model, so you need to download seperate part of this model instead.

    hiben40387Nov 3, 2025· 1 reaction

    @duongve13112002 No it should work with the AIO version too, it's what I used.
    @kou_123 Not sure if 100% necessary but what I had to do to get it to work was install the forge Neo fresh, just doing a git pull didn't work.

    duongve13112002
    Author
    Nov 1, 2025· 26 reactions
    CivitAI

    Hi, my current workflow has completed 60% of the dataset annotation using natural language. In this version, I’ll use both structured and unstructured caption formats for the entire dataset. I’ve also finished implementing a new timestep method, which helps the model achieve better coverage, along with several methods to handle Lumina’s unwanted shifts. The new version will take longer to complete.

    TOTALANTIWOKENov 2, 2025

    so if i want to train a lora, i caption the images with only natural language? (with say joycaption) or mix with booru tags?

    duongve13112002
    Author
    Nov 2, 2025· 1 reaction

    @TOTALANTIWOKE Yes, moreover, with lumina i suggest you a higher learning rate

    ShinwohNov 3, 2025

    Will there be more character knowledge for the new version? The model doesn't recognize many gacha games new characters

    duongve13112002
    Author
    Nov 3, 2025

    @Shinwoh Yes I will add some new characters in my dataset, but i am not sure it can learn everything. I hope the new trainscript can help model learn better

    souleojNov 5, 2025

    its really nice to see youre still working on this model. i really appreciate all the documentation and how open you are about the process, which is something other authors seem to neglect. i am eager to try this new version when its released!

    Zaya17Nov 4, 2025
    CivitAI

    伟大

    CosmicElementNov 4, 2025
    CivitAI

    This is turning out to be quite the promising model. I just have a few questions, is v3.5 supposed to be the one in "continuous/ongoing" training, with the other versions being trained on snapshots of it? If so, does 3.5 have e621 data in it's training, or is that only for the other versions?

    duongve13112002
    Author
    Nov 5, 2025· 1 reaction

    Yes, this version was trained on both e621 and danbooru

    gannibalNov 9, 2025
    CivitAI

    Is there a way to run this model in fp16 mode for older GPU cards that can't do bf16?

    https://huggingface.co/neta-art/Neta-Lumina/discussions/4

    https://github.com/comfyanonymous/ComfyUI/issues/8528

    duongve13112002
    Author
    Nov 9, 2025

    Hi, you can use the image encoder with dtype FP8, which can be used on older GPUs. Here is the link for this: https://civitai.com/models/2023440/neta-lumina-fp8-scaled

    reakaakaskyNov 9, 2025

    emmm... I don't think fp8 model gonna help. The fast way to fix this is using fp32 mm. Comfyui has a node "ModelComputeDtype" for this. Although it is really slow.
    Seems activation values overflowed. I noticed this when I making the fp8 model and forcing it using full fp8 mm. So my fp8 model is just a weight storage file, no real fp8 mm involved.

    It's strange that it's still overflow (must be alot) in fp16, to a point that the whole model collapsed. Does the original Lumina 2 have this issue too?

    duongve13112002
    Author
    Nov 9, 2025

    @reakaakasky Yes, the original lumina model meets this problem too. i havent tested on the low gpus but quatalization like nf4 or nf8 can solve this problem?

    reakaakaskyNov 9, 2025· 1 reaction

    Don't think so. It's the activations overflowed. Not the accuracy of the weights.
    I'm trying to find where the overflow happened. Seems a very early layer. It overflowed before reaching the main 27 Transformer layers.

    reakaakaskyNov 9, 2025

    gave up...I thought it might be a embedding layer or something. Noop. The original Lumina 2 model was trained using bf16. Overflows are everywhere, if using fp16...

    reakaakaskyNov 9, 2025

    Overflows are everywhere

    Ok, not everywhere, worse cases are ~10 linear layers. Activation values overflowed in fp16.
    @duongve13112002 there is a quick dirty fix: check the output of the JointTransformerBlock, because the output is normalized so unlikely overflow. If there is nan in output then recompute the block in fp32 autocast mode. worse cases, recompute 10 blocks in fp32, but still 3x faster than full fp32.

    reakaakaskyNov 9, 2025

    comfy/ldm/lumina/model.py
    I changed the JointTransformerBlock.forword() to this.
    _forword() is the original forward func.
    ```txt

    def forward(

    self,

    x: torch.Tensor,

    x_mask: torch.Tensor,

    freqs_cis: torch.Tensor,

    adaln_input: Optional[torch.Tensor] = None,

    transformer_options={},

    ):

    dtype = x.dtype

    out = self._forward(x, x_mask, freqs_cis, adaln_input, transformer_options)

    isinf,isnan = out.isinf().any(),out.isnan().any()

    if isinf or isnan:

    print(f"inf {isinf}, nan {isnan}")

    with torch.amp.autocast_mode.autocast("cuda", torch.float):

    out = self._forward(

    x, x_mask, freqs_cis, adaln_input, transformer_options

    )

    print(f"fixed out: dtype {out.dtype}, max {out.abs().max().item()}")

    out = out.to(dtype).nan_to_num()

    return out
    ```

    I think a comfyui node can hot patch this. I don't know much about comfyui node...If you want to have a try

    duongve13112002
    Author
    Nov 9, 2025

    @reakaakasky Thanks i will try later

    reakaakaskyNov 10, 2025· 6 reactions
    CivitAI

    Lightning LoRA for v3.5. 2x faster

    https://civitai.com/models/2115586

    duongve13112002
    Author
    Nov 10, 2025· 1 reaction

    Whoa, I was planning to do that when I finished the next version, but you’ve already done it! Thank you

    eliontNov 10, 2025· 1 reaction
    CivitAI

    How to prompt to draw for with specific artist style from https://gumgum10.github.io/gumgum.github.io/ or merging several styles?

    chimpedoutNov 10, 2025

    @artistname usually does the trick, the model is also smart enough to use by artist name and also just straight up the artist name.

    eliontNov 11, 2025

    @chimpedout Thank you.

    necrophagism777Nov 11, 2025· 4 reactions
    CivitAI

    Thank you for the efforts. As a pure Japanese anime model, this is probably the best as open source. The only thing it lacks is community supports, like controlnet models.

    navimixuNov 11, 2025· 3 reactions
    CivitAI

    this is definitely a step in the right direction, its ease of prompting , styles knowlage , and memory budget .. I love it all 💚 cant wait to see what this will end up like after community fin-tuning

    ORANGESTARNov 11, 2025· 2 reactions
    CivitAI

    实话说这个模型真的是太棒了,给我的感觉甚至超越了nai4.5,我想使用这个模型训练lora但是我找不到相关的教程。你可以告诉一些关系这个模型lora相关的教程吗?我知道很希望这个模型能热门起来,它完全有这个实力和资格。

    duongve13112002
    Author
    Nov 11, 2025
    wbx123Nov 12, 2025· 1 reaction

    就是肢体太容易出错了

    ElaineRayNov 20, 2025

    青龙大佬的训练脚本应该是目前最方便获取的了,然而效果并不是一般人能轻松驾驭的,并且训练所需要耗费的时间很长,也许是sdxl模型的几倍

    reakaakaskyNov 20, 2025

    I prefer simpletuner, or diffuser-pipe. sd-scripts is, in fact, deprecated.

    As for training time, lumina 2 takes 3.5x time than sdxl with proper optimizations applied. (torch.compile, etc.), and sd-scripts does not officially support those optimizations and is ~70% slower.

    sujie199519dw172Feb 6, 2026

    @duongve13112002 可以在civitai上面训练吗

    zeuss194Nov 12, 2025· 1 reaction
    CivitAI

    Once you get how prompt work

    https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

    it definitely work quit well. But if somebody can make sense of style reference documentation...

    I mean if you search elsewhere the name listed you will see very different art form what you get in that website. my guess is that they are trying to put random mismatched name on another artist name to avoid liability/copyright issue ?

    https://neta-lumina-style.tz03.xyz/

    munchkinNov 16, 2025· 2 reactions

    The gallery of styles that is for NetaYume is far more comprehensive than the official one:
    https://gumgum10.github.io/gumgum.github.io/

    IRedDragonICYNov 23, 2025· 3 reactions
    CivitAI

    Compatible my sampler, it improves aesthetically

    https://github.com/IRedDragonICY/auraflow_rrf

    nanalayaNov 25, 2025

    How do we use it?

    IRedDragonICYNov 26, 2025· 2 reactions

    @nanalaya if you use ComfyUI, add in /comfy/samplers.py

    patchworkpantsNov 30, 2025

    @IRedDragonICY Wait... So overwrite samplers.py with your .py file?

    nanalayaNov 24, 2025· 3 reactions
    CivitAI

    This is an insane newfound.

    duongve13112002
    Author
    Nov 26, 2025· 34 reactions
    CivitAI

    Hi everyone! The next version of this model will be released next week. After that release, I will pause updates on this model and focus my time and budget on a new architecture called the Z-Image series, created by Alibaba Group.
    From what I’ve heard, this model is very promising, it’s rumored to use the same architecture as the Lumina model, but with more hyperparameters while still running faster. It also reportedly solves many issues, such as generating both anime-style and realistic images, producing accurate text within images, and being easier to train.
    I hope you’ll continue to support me as I work on this new architecture! :D

    reakaakaskyNov 26, 2025

    I'm waiting for the editing version. always want to train a editing model to be good at art style transferring. But qwen editing is so fkn big...

    Also prepping Z-Image, though the main focus right now is on photorealistic styles (internal use only, won't be posted here). I plan to rush the NSFW fine-tuning early next month. I'm trying to nail down the nudity and anatomy details—not to make a porn model per se, but for commercial utility.

    I need the fine-tuning to go well to unlock more funding. I’ve been dying to try full fine-tuning, but I'm not allowed to until I show some results. I've moved from Qwen3-VL 235BA22B to Grok 4.1 for instructions and reasoning, using scripts and manual tweaks to improve the prompts.

    In a commercial context, NSFW usually just means stripping or partial nudity (like nipples); anything explicit like penetration is totally unnecessary...

    Honestly, I don't even know what I'm doing...

    duongve13112002
    Author
    Nov 27, 2025

    @reakaakasky If i have enough resource i will train on edit version too. However, i will tune on base model after that i will distill on turbo :v

    duongve13112002
    Author
    Nov 27, 2025

    @hentai_Researcher_blue Hi if you want to train/ tune it i sugeest you have a good quality dataset both images and annotation. Basically, it has the similar architecture of lumina so this is very important

    ShinwohNov 27, 2025

    Wich Z-Image model are you going to finetune?

    duongve13112002
    Author
    Nov 28, 2025

    @Shinwoh May be i will try on base first. But i have heard about alibaba has a plan for finetune z-image for anime

    reakaakaskyNov 29, 2025

    z-image + NoobAI, o m g...

    duongve13112002
    Author
    Nov 29, 2025

    @reakaakasky I am not sure but on X, some users asked one of their developer tune it on anime and they not sure about that and mentioned "the anime dataset should be used for training variants or LoRA."

    ShinwohJan 29, 2026

    @duongve13112002 Z-Image base is finally out :D

    duongve13112002
    Author
    Jan 30, 2026

    @Shinwoh Yeah i know it, i am training it but the results are not good as i expected :<

    ShinwohFeb 6, 2026

    @duongve13112002 As I expected, Qwen-Image anime-style gens were also not great :( just like it happens with Flux

    The best model for anime-style gens is probably Lumina-DiMOO

    zhu87731518Nov 28, 2025
    CivitAI

    请问是因为模型架构问题吗?为什么iPad上的draw things软件无法导入这个模型呢?

    eikukigaku7Dec 31, 2025

    我去,经典draw things,很久没玩这个app了,自从有了电脑之后

    ShinwohNov 29, 2025
    CivitAI

    Is it possible to upload this model on PixAI?

    zhaiguoaiDec 7, 2025
    CivitAI

    不知道为啥,更新comfy新版本后,不能生成色色的东西了

    empek17Dec 7, 2025· 2 reactions
    CivitAI

    My favorite model on the whole site so far, it can do really complicated prompts and didnt find anything it'd struggle with outside VERY specific character-exclusive things.
    Once you understand how to prompt it feels like you can pretty much generate anything.
    It does come at cost of being really wonky when it comes to short prompts.
    For anyone starting with this model natural language goes a long way in this one so i'd say start with that.

    Checkpoint
    Lumina

    Details

    Downloads
    2,794
    Platform
    CivitAI
    Platform Status
    Available
    Created
    10/10/2025
    Updated
    6/23/2026
    Deleted
    -