CivArchive
    Anima 2B - Qwen 3.5 4B Text Encoder - BASE
    NSFW
    Preview 123736933
    Preview 123736927
    Preview 123736976
    Preview 123736931
    Preview 123736950
    Preview 123736934
    Preview 123736948
    Preview 123736966
    Preview 123736947
    Preview 123736930
    Preview 123736949
    Preview 123736932
    Preview 123736937
    Preview 123736925
    Preview 123736935
    Preview 123736977
    Preview 123736929
    Preview 123736926
    Preview 123736936
    Preview 123736928

    Qwen 3.5 4B Text Encoder for Anima 2B

    NEW → Now supported on Forge Neo (sd-webui-forge-neo) as a native extension! See the Forge Neo install instructions below.

    Installation

    ComfyUI

    Clone the repo into your ComfyUI custom_nodes folder:

    cd ComfyUI/custom_nodes
    git clone https://github.com/GumGum10/comfyui-qwen35-anima.git
    

    Then restart ComfyUI.

    Forge Neo

    Clone or copy the extension into your Forge Neo extensions folder:

    cd sd-webui-forge-neo/extensions
    git clone https://github.com/GumGum10/sd-forge-qwen-35-encoder.git
    

    Then restart Forge Neo. Dependencies (transformers, safetensors) install automatically on first launch.


    What Is This?

    A drop-in upgrade for Anima 2B's text encoder. The stock Anima ships with a tiny 0.6B parameter text encoder — it works, but it struggles with complex prompts. This replaces it with a 4B parameter encoder that understands your prompts significantly better.

    The trade-off: the larger encoder needs alignment work to "speak the same language" as the diffusion model. We've done that work and ship the alignment files with this release. You just need to place files in the right folders and toggle a couple of settings.


    What You Get

    Pros:

    • Much better understanding of complex/long prompts (7× more parameters dedicated to reading your text)

    • Better handling of detailed scene descriptions, multiple subjects, and nuanced instructions

    • Alignment controls let you blend between raw 4B output and 0.6B-compatible output

    Cons:

    • Uses more VRAM than the stock 0.6B encoder (~4GB vs ~0.6GB for the text encoder portion)

    • Slightly slower encoding (more parameters to run)

    • Alignment is an approximation — the diffusion model was trained against the 0.6B, so we're rotating the 4B's output to match. It's very good (0.96 cosine similarity) but not identical

    • This is a reverse-engineered implementation — the original author's private code may differ in subtle ways


    File Placement

    All files are available at: lylogummy/anima2b-qwen-3.5-4b

    ComfyUI

    You'll download 4 files:

    ComfyUI/
    ├── models/
    │   └── text_encoders/
    │       └── qwen35_4b.safetensors          ← THE TEXT ENCODER WEIGHTS
    │
    └── custom_nodes/
        └── comfyui-qwen35-anima/              ← THIS CUSTOM NODE FOLDER
            ├── __init__.py                     ← (comes with the node)
            ├── calibration_params.safetensors  ← MAGNITUDE CALIBRATION
            ├── rotation_matrix.safetensors     ← ALIGNMENT ROTATION
            └── qwen35_tokenizer/              ← TOKENIZER FILES
                ├── tokenizer.json
                ├── vocab.json
                └── merges.txt
    

    Forge Neo

    You only need to download 1 file — the calibration files, alignment matrix, and tokenizer are already bundled with the extension:

    sd-webui-forge-neo/
    ├── models/
    │   └── text_encoder/
    │       ├── qwen_3_06b_base.safetensors     ← STOCK 0.6B (you already have this)
    │       └── qwen35_4b.safetensors            ← DOWNLOAD THIS
    │
    └── extensions/
        └── sd_forge_qwen35_encoder/             ← THIS EXTENSION
            ├── scripts/                         ← (comes with extension)
            ├── lib_qwen35/                      ← (comes with extension)
            ├── calibration_params.safetensors   ← (bundled)
            ├── rotation_matrix.safetensors      ← (bundled)
            └── qwen35_tokenizer/               ← (bundled)
    

    Forge Neo note: Keep qwen_3_06b_base.safetensors selected in the top VAE/Text Encoder dropdown — its LLM adapter is still required. Do not put qwen35_4b.safetensors in that top dropdown.

    Where to download each file:

    qwen35_4b.safetensors (both ComfyUI and Forge Neo) → Download from: text_encoders/ → Place in: ComfyUI/models/text_encoders/ or sd-webui-forge-neo/models/text_encoder/ → What it does: The actual 4B text encoder model weights

    calibration_params.safetensors + rotation_matrix.safetensors (ComfyUI only — bundled in Forge Neo) → Download from: calibration/ → Place in: ComfyUI/custom_nodes/comfyui-qwen35-anima/ → What they do: Calibration scales the 4B output to match the 0.6B's magnitude per dimension. The rotation matrix rotates the 4B's concept directions to match what the adapter expects.

    qwen35_tokenizer/ folder (ComfyUI only — bundled in Forge Neo) → Download from: tokenizer/ → Place in: ComfyUI/custom_nodes/comfyui-qwen35-anima/qwen35_tokenizer/ → What it does: The correct tokenizer (vocab=248K, NOT the default Qwen3 tokenizer) → Note: This will auto-download from HuggingFace on first use if you don't place it manually.


    How to Use

    ComfyUI

    1. Add the "Load Qwen3.5 CLIP (Anima)" node (found under loaders → Anima)

    2. Select qwen35_4b.safetensors from the dropdown

    3. Connect the CLIP output to a CLIPTextEncode node

    4. Use with your Anima 2B checkpoint as normal

    Forge Neo

    1. Load an Anima 2B checkpoint

    2. Make sure qwen_3_06b_base.safetensors is in the top VAE/Text Encoder dropdown

    3. In the generation tab, expand "Qwen3.5 Text Encoder (Anima)" and enable it

    4. Select qwen35_4b.safetensors in the extension's Model File dropdown

    5. Generate as normal — the extension intercepts text encoding automatically

    use_alignment:      ON
    alignment_strength: 0.5
    use_calibration:    OFF
    output_scale:       1.0
    

    That's it. Generate some images and compare against the stock 0.6B.


    Tuning Guide

    What the settings actually do (plain English):

    use_alignment — Rotates the 4B's internal "compass" so that when it says "from the side" or "looking up", it points in the same direction the diffusion model expects. Without this, the 4B understands your prompt fine — it just communicates it in a way the diffusion model misreads.

    alignment_strength (0.0 – 1.0) — The rotation (direction fix) is always on when alignment is enabled. This slider controls how much the magnitude shifts to match the 0.6B:

    • 0.0 = Directions fixed, but keep the 4B's own signal strength

    • 0.5 = Halfway blend ← start here

    • 1.0 = Fully match the 0.6B's signal strength

    use_calibration — A finer-grained magnitude adjustment (per dimension instead of uniform). Can help, can also over-correct. Try it on and off and compare.

    output_scale — A simple multiplier on the final output. Leave at 1.0 unless you know what you're doing.

    1. Generate with alignment OFF first — see what the raw 4B gives you. The text understanding will be better, but poses/viewpoints may be off.

    2. Turn alignment ON, set strength to 0.5 — generate the same prompts again. You should see better pose/viewpoint adherence while keeping the 4B's improved understanding.

    3. Adjust strength — bump it up if spatial stuff is still off, pull it back if quality degrades.

    4. Optionally enable calibration — compare on/off, keep whichever looks better for your use case.


    FAQ

    Q: Do I need both calibration AND alignment files? A: The alignment file (rotation_matrix.safetensors) is the most important one. Calibration is optional and supplementary. You can use alignment without calibration.

    Q: Will this work with any Anima 2B checkpoint? A: Yes — any checkpoint built on Anima 2B that uses the standard text encoder pipeline.

    Q: Does this need extra Python packages? A: For ComfyUI — no, everything ships with ComfyUI already. For Forge Neo — transformers and safetensors install automatically on first launch.

    Q: How much extra VRAM does this use? A: The 4B encoder weights are FP8 quantized, so roughly ~4GB for the text encoder. The stock 0.6B is under 1GB. Your total VRAM usage depends on your diffusion model + VAE + this.

    Q: Why not just scale the output by 10× instead of all this alignment stuff? A: Uniform scaling fixes the magnitude but not the directions. The 4B encodes "from the side" as a vector pointing in a completely different direction than the 0.6B. The rotation matrix fixes that. Scaling alone would be like shouting the wrong directions louder.

    Q: Is this better than the stock 0.6B? A: For text understanding — yes, meaningfully. For raw out-of-the-box image quality — it depends on your alignment settings and prompts. The 0.6B has the advantage of being exactly what the model was trained against. The 4B has the advantage of actually understanding complex prompts. With alignment at 0.5, most users see comparable or better results, especially on detailed prompts where the 0.6B falls short.

    Q: Can I use this with img2img? A: Yes — works for both txt2img and img2img on both ComfyUI and Forge Neo.

    Q: Why does Forge Neo still need the 0.6B model loaded? A: The Anima pipeline uses a small LLM adapter that lives on the 0.6B model. This adapter converts text embeddings into the format the diffusion model expects. The 4B provides the text understanding, but the adapter (on the 0.6B) still handles the final conversion. Both models are needed.


    Credits

    Description

    FAQ

    Comments (39)

    LyloGummy
    Author
    Mar 10, 2026· 8 reactions
    CivitAI

    Disclaimer: This is more of an adapter than a fully fledged qwen 3.5 implementation. Are the results better? Not necessarily, are they worse? again not necessarily...it's all subjective. Qwen 3.5 4B is several times bigger than qwen 3 0.6B so the model understands a lot more concepts, and it has multilingual support too. Please test and let me know your thoughts, if you see issues with prompt following set alignment to 1.0 in comfyui and that should fix it

    AnimaXxMar 10, 2026· 1 reaction

    Great work 👏 this must have taken a very long time so well done 👍

    LyloGummy
    Author
    Mar 10, 2026

    @AnimaXx Thank you! Really appreciate it! The code part was pretty straightforward...the logic behind it was my time sink 😂

    compgamer1337267Mar 10, 2026
    CivitAI

    is that possible to use with Forge?

    LyloGummy
    Author
    Mar 10, 2026· 5 reactions

    Not at the moment but I can look into supporting it, which forge is being used right now? Neo?

    ujustgotcyberfuuck213Mar 10, 2026· 6 reactions

    @LyloGummy Definitely Neo!

    @LyloGummy Yes Neo would be your best bet.

    LyloGummy
    Author
    Mar 10, 2026· 4 reactions

    Thanks! Will get neo shipped today/tmrw

    LyloGummy
    Author
    Mar 10, 2026· 2 reactions

    @ujustgotcyberfuuck213 @sneedingonmyligma420 @compgamer1337267 Forge Neo is now supported:

    https://github.com/GumGum10/sd-forge-qwen-35-encoder

    @LyloGummy nice. well, i ran it, with the suggested settings on your github page, overall it crushed prompt adherence, maybe improved the prompt i ran in a specific way i was looking for but results varied wildly. the project certainly has potential.

    LyloGummy
    Author
    Mar 10, 2026

    @sneedingonmyligma420 yep noticed that as well, if you set alignment to 1 it's gonna work better. This is due to the fact that Anima was trained with 0.6B. This will improve as the TE is trained more (if)

    compgamer1337267Mar 11, 2026

    @LyloGummy sorry, i cant get what to do, where do i find this extension?

    LyloGummy
    Author
    Mar 12, 2026

    @compgamer1337267 Hey! Just download this archive/clone the repo, and unzip/place in sd-forge-neo folder -> extensions https://github.com/GumGum10/sd-forge-qwen-35-encoder

    compgamer1337267Mar 12, 2026

    @LyloGummy Sorry for the stupid question, but I just don't understand how to download the file...

    letme123Mar 10, 2026· 1 reaction
    CivitAI

    This is a good job, but my 4050 only has 6GB of video memory, so I won't try it.

    LyloGummy
    Author
    Mar 10, 2026· 2 reactions

    Hi, thanks for the feedback, I will look into adding the option to offload the TE to CPU/RAM so more people can try it. Also if the original author of the TE open sources the 2B variant I can add support for that as well, as it should fit with 6gb https://huggingface.co/nightknocker/cosmos-qwen3.5

    Seii1Mar 10, 2026
    CivitAI

    i do use comfyui for video generation and some qwen flux, but for anime style i use forge cause faster and easier also can inpaint/img2img easier, i wish this work on forge

    LyloGummy
    Author
    Mar 10, 2026
    Seii1Mar 10, 2026

    @LyloGummy i got this error

    AttributeError: 'Qwen3_06B' object has no attribute 'llm_adapter'

    LyloGummy
    Author
    Mar 12, 2026

    @Seii1 Hey, please open an issue here and post the full console logs/output, will take a look:
    https://github.com/GumGum10/sd-forge-qwen-35-encoder

    dousuruoribeyasu6391Mar 10, 2026· 1 reaction
    CivitAI

    望ましい結果が得られる事はありませんでしたがRTX3060の環境でも動作そのものは軽快でした。

    AnimaXxMar 10, 2026· 3 reactions
    CivitAI

    This looks promising but is it even worth it as on some of the pictures the Qwen3 0.6b looks better and the prompt adherence looks similar too.

    LyloGummy
    Author
    Mar 10, 2026· 3 reactions

    It's all subjective imo, we make use of qwen 3.5's 4B parameters and use an adapter to generate the embeddings that Anima was trained on, that being said this is just the initial release and I'm researching ways to improve it further, if anything this is just a proof of concept to show that Anima is compatible with larger LLMs. I do appreciate the feedback! ❤

    AnimaXxMar 10, 2026

    @LyloGummy It might be worth researching Rouwei-Gemma for Illustrious As I think it's a LLM T5 adapter for Illustrious clip I haven't used it but I have heard good things about it. https://civitai.com/models/1782437/rouwei-gemma

    As it might help with future models that you make. Although again it does use an older architecture of SDXL/Illustrious and T5 rather than Qwen3...

    GPUPoorChadMar 10, 2026· 1 reaction

    @LyloGummy wait isn't that what it already does but your adding another layer of it? Qwen 3.5 -> 3 translator-> I forgot what cosmos used or are you just replacing the 3 part with your own thing

    Also, how are you training it? If your training it on outputs of 0.6b might just learn it's mistakes, base training then some kind of RL with prompt adherence rated somehow would probably be best don't ask me how to do that at scale though maybe VLLMs or smaller models that try to generate tags from images

    GPUPoorChadMar 10, 2026
    CivitAI

    Qwen is probably best for now, but is model not in Qwen series doable you think?

    LyloGummy
    Author
    Mar 11, 2026

    Any LLM should be doable in theory, the real problem is if we wish to avoid re-training, how do we align the embeddings of another LLM so they match 0.6B embeddings. And are the results better enough to make it worthwhile? Which model did you have in mind if I may ask?

    NihongasukiMar 10, 2026
    CivitAI

    Seems to not work with natural language prompts at all. I tried with different checkpoints with and without lora, with different samplers etc.
    As soon as I switched to tag based prompts both encoders worked similarly fine, but with natural language the encoder basically only focused on a single paragraph. Some images even came out as flat colors (or in one case, a green flat color with two very tiny figures visible xD).

    I uploaded my results here https://civitai.com/posts/27161701 though I wasn't able to tag which images used which encoder. They all contain their workflows though, so you can check that way.

    But for reference, none of the images in that post with ice-cream or other people where produced with your encoder. The best image (the one where the chracter is showing a piece sign) still ignored most of the prompt.

    PS: I also had multiple crashes with dynamic vram enabled, though I'm not sure what exactly caused them.

    richerichMar 10, 2026

    Yeah, i noticed that as well. The majority of the prompt that i was using was completely ignored, as i'm using about 30% tags and 70% natural language... Only the tags was generated.
    Not to mention that generations took about 3 times longer than normal (using Forge Neo).

    GPUPoorChadMar 10, 2026· 8 reactions
    CivitAI

    Really cool proof of concept and I really want you to experiment with this idea and maybe get something way better than what we have, but in my testing it's just way worse than the normal text encoder. Really do not want to discourage you at end of the day though to be super clear

    LyloGummy
    Author
    Mar 11, 2026· 1 reaction

    Appreciate the honest feedback! This is what I lacked during my testing lol, stay tuned for next version, I am addressing all issues

    zanebeMar 10, 2026· 3 reactions
    CivitAI

    Works very bad: artists don't look like artists, ignores "from side", missing limbs. With alignment at 1.0 or without. Turning on calibration generates a grid of noise.

    Hisa_eromikoMar 11, 2026
    CivitAI


    Firstly,if you use calibration and alignment same time, you will get noise. But you can get picture by using either calibration or alignment, so strange.

    This is a great technical attempt, and at the same time, I must agree with some comments that the performance of this version is currently not as good as the original

    3840536Mar 11, 2026
    CivitAI

    Anima was trained on 0.6B, 4B has different embeddings, first of all solve this problem

    LyloGummy
    Author
    Mar 11, 2026· 14 reactions
    CivitAI

    Hi! Thanks for all the great feedback! The community interest is overwhelming, that tells me that I have to make this project meet expectations! Posting here as its easier than responding to each comment, but rest assured I am reading all of them. This is more a proof of concept rather than anything else, and I am actively working to address all the issues, including NL, quality, artists, limbs etc. I still believe with some more iterations this can become a good replacement for qwen 3 0.6b. Stay tuned

    AnimaXxMar 11, 2026
    CivitAI

    So I gave it a go however, unfortunately it didn't work very well, so the checkpoint I'm using is AnimaYume and I'm Alsop using the RDBT - Anima stability Lora. However, unfortunately with this LLM adapter it is completely ignoring the prompt and it isn't given what I want. I don't know if I'm doing it right, but I've tried multiple times and it's not really working

    TetsuooMar 16, 2026· 1 reaction
    CivitAI

    I had hopes but nope, this is not good for me. We completely lose characters knowledge, this is not good. Instant delete, sorry

    aldentheronMar 30, 2026
    CivitAI

    This has great potential! But it didn't work for me either in its current state. I experimented all night using different alignment strengths. Not just character knowledge was lost, but other things like landmarks too (Tokyo Tower became Eiffel Tower). Looking forward to future improvements!

    compgamer1337267Apr 17, 2026
    CivitAI

    Hey, bro, cool work. I was just thinking, Anima really loves good prompts out of the box, with every detail clearly written. In Illustris, you could get a cool image by typing a few words. The model, as I understand it, had a built-in prompt enhancer. Is it possible to do something like this for Anima? So that you can get varied, logical images from a single sentence without requiring a lot of careful thought?

    Checkpoint
    Anima

    Details

    Downloads
    1,119
    Platform
    CivitAI
    Platform Status
    Available
    Created
    3/10/2026
    Updated
    5/13/2026
    Deleted
    -

    Files

    anima2BQwen354BText_base.safetensors