Qwen_3_4B Trained Text Encoder
For use in Z-Image or FLUX Klein4B
Without lora's LLM had similar text errors to BF16 base QWEN, but better adherence to nude prompt.
See the comparison article here
Description
FAQ
Comments (11)
That's super nice! Was the base model used for training abliterated to remove failsafe?
I would need to test it in think mode to answer that. While it did shift the gradient by a large amount, I had to be careful not to burn it as the base model is 32k token lenght
@Felldude I did a test using my own quantization of your model and it works great. I love that it better understand my poor prompting ^^;
@n_Arno Thank you, it was 2 days of web crawling, 5 days of Llama captioning and a few days creating custom tools to ensure the metadata was preserved, while still giving accurate captions to 500-1000 length. The LLM itself did not take as long to train as the prep.
Did you train this with Z-Image? As part of the diffusion pipeline? Or did you train this separately as a standalone Qwen LM?
It was trained as a an LLM standalone.
I really like what you did here.
There's (almost) no need to use the seed variance node anymore, as it really likes to improve the character composition.
Nudity is also great here, this one has a better comprehension of physical diferences like volume or bust size.
Of course, sexual intercouse don't work here, z-image is not for that; if you wanna try that stick with the original encoder.
Thank you for sharing. ♥♥♥♥
Thank you
Very good on human body details and textures.
But everyone is Asian now, this side effect is too strong, sadly.
It stills useful for inpaint, thought.
👍
Write a more intelligent and longer prompt then! Explain the intent, scene, compulsory image framing, mood, techniques, include reference character profiles, explain what to focus on and why, etc. Try to not micromanage the model with detail. You can also precede it with a "system prompt" addressing the text encoder LLM and forming its attitudes and biases.
I would also suggest to try different models, as the level of image cognition between them is varied significantly.


