Anima 2B - Qwen 3.5 4B Text Encoder - BASE

NSFW

Qwen 3.5 4B Text Encoder for Anima 2B

NEW → Now supported on Forge Neo (sd-webui-forge-neo) as a native extension! See the Forge Neo install instructions below.

Installation

ComfyUI

Clone the repo into your ComfyUI custom_nodes folder:

cd ComfyUI/custom_nodes
git clone https://github.com/GumGum10/comfyui-qwen35-anima.git

Then restart ComfyUI.

Forge Neo

Clone or copy the extension into your Forge Neo extensions folder:

cd sd-webui-forge-neo/extensions
git clone https://github.com/GumGum10/sd-forge-qwen-35-encoder.git

Then restart Forge Neo. Dependencies (transformers, safetensors) install automatically on first launch.

What Is This?

A drop-in upgrade for Anima 2B's text encoder. The stock Anima ships with a tiny 0.6B parameter text encoder — it works, but it struggles with complex prompts. This replaces it with a 4B parameter encoder that understands your prompts significantly better.

The trade-off: the larger encoder needs alignment work to "speak the same language" as the diffusion model. We've done that work and ship the alignment files with this release. You just need to place files in the right folders and toggle a couple of settings.

What You Get

Pros:

Much better understanding of complex/long prompts (7× more parameters dedicated to reading your text)
Better handling of detailed scene descriptions, multiple subjects, and nuanced instructions
Alignment controls let you blend between raw 4B output and 0.6B-compatible output

Cons:

Uses more VRAM than the stock 0.6B encoder (~4GB vs ~0.6GB for the text encoder portion)
Slightly slower encoding (more parameters to run)
Alignment is an approximation — the diffusion model was trained against the 0.6B, so we're rotating the 4B's output to match. It's very good (0.96 cosine similarity) but not identical
This is a reverse-engineered implementation — the original author's private code may differ in subtle ways

File Placement

All files are available at: lylogummy/anima2b-qwen-3.5-4b

ComfyUI

You'll download 4 files:

ComfyUI/
├── models/
│   └── text_encoders/
│       └── qwen35_4b.safetensors          ← THE TEXT ENCODER WEIGHTS
│
└── custom_nodes/
    └── comfyui-qwen35-anima/              ← THIS CUSTOM NODE FOLDER
        ├── __init__.py                     ← (comes with the node)
        ├── calibration_params.safetensors  ← MAGNITUDE CALIBRATION
        ├── rotation_matrix.safetensors     ← ALIGNMENT ROTATION
        └── qwen35_tokenizer/              ← TOKENIZER FILES
            ├── tokenizer.json
            ├── vocab.json
            └── merges.txt

Forge Neo

You only need to download 1 file — the calibration files, alignment matrix, and tokenizer are already bundled with the extension:

sd-webui-forge-neo/
├── models/
│   └── text_encoder/
│       ├── qwen_3_06b_base.safetensors     ← STOCK 0.6B (you already have this)
│       └── qwen35_4b.safetensors            ← DOWNLOAD THIS
│
└── extensions/
    └── sd_forge_qwen35_encoder/             ← THIS EXTENSION
        ├── scripts/                         ← (comes with extension)
        ├── lib_qwen35/                      ← (comes with extension)
        ├── calibration_params.safetensors   ← (bundled)
        ├── rotation_matrix.safetensors      ← (bundled)
        └── qwen35_tokenizer/               ← (bundled)

Forge Neo note: Keep qwen_3_06b_base.safetensors selected in the top VAE/Text Encoder dropdown — its LLM adapter is still required. Do not put qwen35_4b.safetensors in that top dropdown.

Where to download each file:

qwen35_4b.safetensors (both ComfyUI and Forge Neo) → Download from: text_encoders/ → Place in: ComfyUI/models/text_encoders/ or sd-webui-forge-neo/models/text_encoder/ → What it does: The actual 4B text encoder model weights

calibration_params.safetensors + rotation_matrix.safetensors (ComfyUI only — bundled in Forge Neo) → Download from: calibration/ → Place in: ComfyUI/custom_nodes/comfyui-qwen35-anima/ → What they do: Calibration scales the 4B output to match the 0.6B's magnitude per dimension. The rotation matrix rotates the 4B's concept directions to match what the adapter expects.

qwen35_tokenizer/ folder (ComfyUI only — bundled in Forge Neo) → Download from: tokenizer/ → Place in: ComfyUI/custom_nodes/comfyui-qwen35-anima/qwen35_tokenizer/ → What it does: The correct tokenizer (vocab=248K, NOT the default Qwen3 tokenizer) → Note: This will auto-download from HuggingFace on first use if you don't place it manually.

How to Use

ComfyUI

Add the "Load Qwen3.5 CLIP (Anima)" node (found under loaders → Anima)
Select qwen35_4b.safetensors from the dropdown
Connect the CLIP output to a CLIPTextEncode node
Use with your Anima 2B checkpoint as normal

Forge Neo

Load an Anima 2B checkpoint
Make sure qwen_3_06b_base.safetensors is in the top VAE/Text Encoder dropdown
In the generation tab, expand "Qwen3.5 Text Encoder (Anima)" and enable it
Select qwen35_4b.safetensors in the extension's Model File dropdown
Generate as normal — the extension intercepts text encoding automatically

Recommended Settings to Start (both)

use_alignment:      ON
alignment_strength: 0.5
use_calibration:    OFF
output_scale:       1.0

That's it. Generate some images and compare against the stock 0.6B.

Tuning Guide

What the settings actually do (plain English):

use_alignment — Rotates the 4B's internal "compass" so that when it says "from the side" or "looking up", it points in the same direction the diffusion model expects. Without this, the 4B understands your prompt fine — it just communicates it in a way the diffusion model misreads.

alignment_strength (0.0 – 1.0) — The rotation (direction fix) is always on when alignment is enabled. This slider controls how much the magnitude shifts to match the 0.6B:

0.0 = Directions fixed, but keep the 4B's own signal strength
0.5 = Halfway blend ← start here
1.0 = Fully match the 0.6B's signal strength

use_calibration — A finer-grained magnitude adjustment (per dimension instead of uniform). Can help, can also over-correct. Try it on and off and compare.

output_scale — A simple multiplier on the final output. Leave at 1.0 unless you know what you're doing.

Recommended workflow:

Generate with alignment OFF first — see what the raw 4B gives you. The text understanding will be better, but poses/viewpoints may be off.
Turn alignment ON, set strength to 0.5 — generate the same prompts again. You should see better pose/viewpoint adherence while keeping the 4B's improved understanding.
Adjust strength — bump it up if spatial stuff is still off, pull it back if quality degrades.
Optionally enable calibration — compare on/off, keep whichever looks better for your use case.

FAQ

Q: Do I need both calibration AND alignment files? A: The alignment file (rotation_matrix.safetensors) is the most important one. Calibration is optional and supplementary. You can use alignment without calibration.

Q: Will this work with any Anima 2B checkpoint? A: Yes — any checkpoint built on Anima 2B that uses the standard text encoder pipeline.

Q: Does this need extra Python packages? A: For ComfyUI — no, everything ships with ComfyUI already. For Forge Neo — transformers and safetensors install automatically on first launch.

Q: How much extra VRAM does this use? A: The 4B encoder weights are FP8 quantized, so roughly ~4GB for the text encoder. The stock 0.6B is under 1GB. Your total VRAM usage depends on your diffusion model + VAE + this.

Q: Why not just scale the output by 10× instead of all this alignment stuff? A: Uniform scaling fixes the magnitude but not the directions. The 4B encodes "from the side" as a vector pointing in a completely different direction than the 0.6B. The rotation matrix fixes that. Scaling alone would be like shouting the wrong directions louder.

Q: Is this better than the stock 0.6B? A: For text understanding — yes, meaningfully. For raw out-of-the-box image quality — it depends on your alignment settings and prompts. The 0.6B has the advantage of being exactly what the model was trained against. The 4B has the advantage of actually understanding complex prompts. With alignment at 0.5, most users see comparable or better results, especially on detailed prompts where the 0.6B falls short.

Q: Can I use this with img2img? A: Yes — works for both txt2img and img2img on both ComfyUI and Forge Neo.

Q: Why does Forge Neo still need the 0.6B model loaded? A: The Anima pipeline uses a small LLM adapter that lives on the 0.6B model. This adapter converts text embeddings into the format the diffusion model expects. The 4B provides the text understanding, but the adapter (on the 0.6B) still handles the final conversion. Both models are needed.

Credits

Anima 2B: circlestone-labs
Qwen 3.5 4B for Anima: nightknocker/cosmos-qwen3.5
Custom Node, Alignment & Forge Neo Port: GumGum10

Description

FAQ

Comments (39)

LyloGummy

Author

Mar 10, 2026· 8 reactions

CivitAI

Disclaimer: This is more of an adapter than a fully fledged qwen 3.5 implementation. Are the results better? Not necessarily, are they worse? again not necessarily...it's all subjective. Qwen 3.5 4B is several times bigger than qwen 3 0.6B so the model understands a lot more concepts, and it has multilingual support too. Please test and let me know your thoughts, if you see issues with prompt following set alignment to 1.0 in comfyui and that should fix it

AnimaXxMar 10, 2026· 1 reaction

Great work 👏 this must have taken a very long time so well done 👍

LyloGummy

Author

Mar 10, 2026

@AnimaXx Thank you! Really appreciate it! The code part was pretty straightforward...the logic behind it was my time sink 😂

compgamer1337267Mar 10, 2026

CivitAI

is that possible to use with Forge?

LyloGummy

Author

Mar 10, 2026· 5 reactions

Not at the moment but I can look into supporting it, which forge is being used right now? Neo?

ujustgotcyberfuuck213Mar 10, 2026· 6 reactions

@LyloGummy Definitely Neo!

sneedingonmyligma420Mar 10, 2026

@LyloGummy Yes Neo would be your best bet.

LyloGummy

Author

Mar 10, 2026· 4 reactions

Thanks! Will get neo shipped today/tmrw

LyloGummy

Author

Mar 10, 2026· 2 reactions

@ujustgotcyberfuuck213 @sneedingonmyligma420 @compgamer1337267 Forge Neo is now supported:

https://github.com/GumGum10/sd-forge-qwen-35-encoder

sneedingonmyligma420Mar 10, 2026

@LyloGummy nice. well, i ran it, with the suggested settings on your github page, overall it crushed prompt adherence, maybe improved the prompt i ran in a specific way i was looking for but results varied wildly. the project certainly has potential.

LyloGummy

Author

Mar 10, 2026

@sneedingonmyligma420 yep noticed that as well, if you set alignment to 1 it's gonna work better. This is due to the fact that Anima was trained with 0.6B. This will improve as the TE is trained more (if)

compgamer1337267Mar 11, 2026

@LyloGummy sorry, i cant get what to do, where do i find this extension?

LyloGummy

Author

Mar 12, 2026

@compgamer1337267 Hey! Just download this archive/clone the repo, and unzip/place in sd-forge-neo folder -> extensions https://github.com/GumGum10/sd-forge-qwen-35-encoder

compgamer1337267Mar 12, 2026

@LyloGummy Sorry for the stupid question, but I just don't understand how to download the file...

letme123Mar 10, 2026· 1 reaction

CivitAI

This is a good job, but my 4050 only has 6GB of video memory, so I won't try it.

LyloGummy

Author

Mar 10, 2026· 2 reactions

Hi, thanks for the feedback, I will look into adding the option to offload the TE to CPU/RAM so more people can try it. Also if the original author of the TE open sources the 2B variant I can add support for that as well, as it should fit with 6gb https://huggingface.co/nightknocker/cosmos-qwen3.5

Seii1Mar 10, 2026

CivitAI

i do use comfyui for video generation and some qwen flux, but for anime style i use forge cause faster and easier also can inpaint/img2img easier, i wish this work on forge

LyloGummy

Author

Mar 10, 2026

Forge is now supported!

https://github.com/GumGum10/sd-forge-qwen-35-encoder

Seii1Mar 10, 2026

@LyloGummy i got this error

AttributeError: 'Qwen3_06B' object has no attribute 'llm_adapter'

LyloGummy

Author

Mar 12, 2026

@Seii1 Hey, please open an issue here and post the full console logs/output, will take a look:
https://github.com/GumGum10/sd-forge-qwen-35-encoder

dousuruoribeyasu6391Mar 10, 2026· 1 reaction

CivitAI

望ましい結果が得られる事はありませんでしたがRTX3060の環境でも動作そのものは軽快でした。

AnimaXxMar 10, 2026· 3 reactions

CivitAI

This looks promising but is it even worth it as on some of the pictures the Qwen3 0.6b looks better and the prompt adherence looks similar too.

LyloGummy

Author

Mar 10, 2026· 3 reactions

It's all subjective imo, we make use of qwen 3.5's 4B parameters and use an adapter to generate the embeddings that Anima was trained on, that being said this is just the initial release and I'm researching ways to improve it further, if anything this is just a proof of concept to show that Anima is compatible with larger LLMs. I do appreciate the feedback! ❤

AnimaXxMar 10, 2026

@LyloGummy It might be worth researching Rouwei-Gemma for Illustrious As I think it's a LLM T5 adapter for Illustrious clip I haven't used it but I have heard good things about it. https://civitai.com/models/1782437/rouwei-gemma

As it might help with future models that you make. Although again it does use an older architecture of SDXL/Illustrious and T5 rather than Qwen3...

GPUPoorChadMar 10, 2026· 1 reaction

@LyloGummy wait isn't that what it already does but your adding another layer of it? Qwen 3.5 -> 3 translator-> I forgot what cosmos used or are you just replacing the 3 part with your own thing

Also, how are you training it? If your training it on outputs of 0.6b might just learn it's mistakes, base training then some kind of RL with prompt adherence rated somehow would probably be best don't ask me how to do that at scale though maybe VLLMs or smaller models that try to generate tags from images

GPUPoorChadMar 10, 2026

CivitAI

Qwen is probably best for now, but is model not in Qwen series doable you think?

LyloGummy

Author

Mar 11, 2026

Any LLM should be doable in theory, the real problem is if we wish to avoid re-training, how do we align the embeddings of another LLM so they match 0.6B embeddings. And are the results better enough to make it worthwhile? Which model did you have in mind if I may ask?

NihongasukiMar 10, 2026

CivitAI

Seems to not work with natural language prompts at all. I tried with different checkpoints with and without lora, with different samplers etc.
As soon as I switched to tag based prompts both encoders worked similarly fine, but with natural language the encoder basically only focused on a single paragraph. Some images even came out as flat colors (or in one case, a green flat color with two very tiny figures visible xD).

I uploaded my results here https://civitai.com/posts/27161701 though I wasn't able to tag which images used which encoder. They all contain their workflows though, so you can check that way.

But for reference, none of the images in that post with ice-cream or other people where produced with your encoder. The best image (the one where the chracter is showing a piece sign) still ignored most of the prompt.

PS: I also had multiple crashes with dynamic vram enabled, though I'm not sure what exactly caused them.

richerichMar 10, 2026

Yeah, i noticed that as well. The majority of the prompt that i was using was completely ignored, as i'm using about 30% tags and 70% natural language... Only the tags was generated.
Not to mention that generations took about 3 times longer than normal (using Forge Neo).

GPUPoorChadMar 10, 2026· 8 reactions

CivitAI

Really cool proof of concept and I really want you to experiment with this idea and maybe get something way better than what we have, but in my testing it's just way worse than the normal text encoder. Really do not want to discourage you at end of the day though to be super clear

LyloGummy

Author

Mar 11, 2026· 1 reaction

Appreciate the honest feedback! This is what I lacked during my testing lol, stay tuned for next version, I am addressing all issues

zanebeMar 10, 2026· 3 reactions

CivitAI

Works very bad: artists don't look like artists, ignores "from side", missing limbs. With alignment at 1.0 or without. Turning on calibration generates a grid of noise.

Hisa_eromikoMar 11, 2026

CivitAI

Firstly,if you use calibration and alignment same time, you will get noise. But you can get picture by using either calibration or alignment, so strange.

This is a great technical attempt, and at the same time, I must agree with some comments that the performance of this version is currently not as good as the original

3840536Mar 11, 2026

CivitAI

Anima was trained on 0.6B, 4B has different embeddings, first of all solve this problem

LyloGummy

Author

Mar 11, 2026· 14 reactions

CivitAI

Hi! Thanks for all the great feedback! The community interest is overwhelming, that tells me that I have to make this project meet expectations! Posting here as its easier than responding to each comment, but rest assured I am reading all of them. This is more a proof of concept rather than anything else, and I am actively working to address all the issues, including NL, quality, artists, limbs etc. I still believe with some more iterations this can become a good replacement for qwen 3 0.6b. Stay tuned

AnimaXxMar 11, 2026

CivitAI

So I gave it a go however, unfortunately it didn't work very well, so the checkpoint I'm using is AnimaYume and I'm Alsop using the RDBT - Anima stability Lora. However, unfortunately with this LLM adapter it is completely ignoring the prompt and it isn't given what I want. I don't know if I'm doing it right, but I've tried multiple times and it's not really working

TetsuooMar 16, 2026· 1 reaction

CivitAI

I had hopes but nope, this is not good for me. We completely lose characters knowledge, this is not good. Instant delete, sorry

aldentheronMar 30, 2026

CivitAI

This has great potential! But it didn't work for me either in its current state. I experimented all night using different alignment strengths. Not just character knowledge was lost, but other things like landmarks too (Tokyo Tower became Eiffel Tower). Looking forward to future improvements!

compgamer1337267Apr 17, 2026

CivitAI

Hey, bro, cool work. I was just thinking, Anima really loves good prompts out of the box, with every detail clearly written. In Illustris, you could get a cool image by typing a few words. The model, as I understand it, had a built-in prompt enhancer. Is it possible to do something like this for Anima? So that you can get varied, logical images from a single sentence without requiring a lot of careful thought?

Checkpoint

Anima

by LyloGummy

Download (Beta) View on CivitAI

base model

anima

Details

Downloads

1,119

Platform

CivitAI

Platform Status

Available

Created

3/10/2026

Updated

5/13/2026

Deleted

Files

anima2BQwen354BText_base.safetensors

Size:

4.45 GB

SHA256:

ea289be7c916726d09953c7db9971c82b280e694b5d7c47f8ad9ffad6acb54ba

Mirrors

HuggingFace (2 mirrors)

qwen35_4b.safetensors

cosmos4b.safetensors

CivitAI (1 mirrors)

anima2BQwen354BText_base.safetensors

Qwen 3.5 4B Text Encoder for Anima 2B

Installation

ComfyUI

Forge Neo

What Is This?

What You Get

File Placement

ComfyUI

Forge Neo

Where to download each file:

How to Use

ComfyUI

Forge Neo

Recommended Settings to Start (both)

Tuning Guide

What the settings actually do (plain English):

Recommended workflow:

FAQ

Credits

Description

FAQ

What is Anima 2B - Qwen 3.5 4B Text Encoder?

How do I use Anima 2B - Qwen 3.5 4B Text Encoder?

What files are available and where can I download them?

Comments (39)

Details

Files

anima2BQwen354BText_base.safetensors

Mirrors