CivArchive

    This is a full restoration of Zero0Int and AbstractPila work on the CLIP-L model.

    • In 1000's of test images the FP32 model corrected bad anatomy in cases where the FP16 model failed (Seed to Seed)

    • In most cases the images have little difference but when differences exist the FP32 model was reliably more accurate (As it should be mathematically)

    • Note the vision models use the VIT-L vision blocks Zer0Int has a new finetune of the VIT/Vision model here

    MIT License

    Copyright (c) 2021 OpenAI

    Permission is hereby granted, free of charge, to any person obtaining a copy

    of this software and associated documentation files (the "Software"), to deal

    in the Software without restriction, including without limitation the rights

    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

    copies of the Software, and to permit persons to whom the Software is

    furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all

    copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

    SOFTWARE.

    Description

    FAQ

    Comments (27)

    fox23vang226Feb 6, 2025· 1 reaction
    CivitAI

    I'm using forgeui, and confused, do I load both clip L and clip L vision, or just one?.

    Felldude
    Author
    Feb 6, 2025· 2 reactions

    Vision CLIP is for video models like hunyan, it should load but the vision blocks would be ignored for most models

    az420Feb 6, 2025· 5 reactions
    CivitAI

    Lots of versions... which one is most complete?

    Felldude
    Author
    Feb 7, 2025· 6 reactions

    Vision would be for video models, changing a clip will have drastic results on an output seed to seed - I have done some write ups but it would still be personal preference

    EnragedAntelopeFeb 6, 2025
    CivitAI

    When would we use each of these please?
    Do we just switch full time to the latest FP32 "vision" model for day to day Flux use or is that overkill?
    I appreciate your work but don't understand the difference in versions... thank you.

    Felldude
    Author
    Feb 7, 2025· 2 reactions

    I have not tested the vision model with flux, it should be the same seed to seed output as the pruned CLIP-L, to my knowledge only certain video models use the vision blocks on the full VIT-CLIP

    ericreatorFeb 27, 2025
    CivitAI

    Possible to get a gguf of this text encoder?

    Felldude
    Author
    Feb 28, 2025· 1 reaction

    I'm not sure if vision is part of the exisiting gguf architecture, the other models are not big enough to warrant quantization unless your trying to fit in a 20+ year old GPU

    ericreatorFeb 28, 2025· 1 reaction

    @Felldude I think we might get faster speeds at least loading them in comfy so might be worth trying, Clip G is substantially larger. Just running lots of tests on these different text encoders. Thanks for the research, love your TEs!

    ericreatorMar 1, 2025· 1 reaction

    Found gguf versions for testing https://huggingface.co/chatpig/t5xxl/tree/main

    MescalambaMar 2, 2025

    In theory it shouldnt be problem, there are tools that allow to GGUF anything. Lower than full Q will also lower quality tho.

    7175655Mar 7, 2025· 1 reaction
    CivitAI

    Are you considering doing a fine tune of the LongCLIP-L (also provided by zer0int)?

    Thank you for your work =)

    Felldude
    Author
    Apr 22, 2025

    I believe the latest vit model they put out is FP32

    littlefluffyballApr 13, 2025· 3 reactions
    CivitAI

    Only working combination I found which works for some SDXL checkpoints, ComfyUI.

    - clipLCLIPGFullFP32_simulacrumCLIPGFP32 2.7Gb
    - clipLCLIPGFullFP32_simulacrumCLIPLFP32 494Mb

    Load both in Dual Clip loader, set --fp32-text-enc argument.
    Will produce more detailed and realistic images. Understands prompt better than built in Clip.

    Sad reality: Will produce garbage with most of Pony mixes, some mixes will output low quality image, others just noise, for some strange reason it works fine with many Illustrous checkpoints (but not all of them).

    Felldude
    Author
    Apr 22, 2025

    I have the FP32 CLIP's for pony also, some pony models absolutely require the PONY versions of the CLIP

    OlbanetsMay 27, 2025
    CivitAI

    I'd like to get more information. Please, give a link to Simulacrum Clip-G source.

    Felldude
    Author
    Jun 20, 2025

    The Authors of Sim and Zer0Int are linked at the top of the article.

    soyv4Jun 5, 2025
    CivitAI

    Please tell me Zer0Int-Vision_CLIP_FP32 is the same file as on https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main only under a different name (model.safetensors)?

    Felldude
    Author
    Jun 6, 2025· 1 reaction

    I do not know on that

    amazingbeautyJul 13, 2025
    CivitAI

    which of this to use as clip l for flux ?! even what differ ???

    Ponder_StibbonsJul 27, 2025· 3 reactions

    Both clip l and g. The different training info for each is clearly listed. Top right. Flux likes lots of encoders. Cram them in there like a clown car. Node up a sexatuple clip encoder, or at the very least a triple encoder. Stack them up and swap them around until the ksampler stops throwing errors at you. Worked for me.

    amazingbeautyJul 27, 2025

    Ponder_Stibbons good idea .. but i live at 50 years ago i'm on CPU no time for this (i mean days) , always good noob aimed info at description helps.

    Felldude
    Author
    Jul 27, 2025· 2 reactions

    It would be CLIP-L and T5, for cpu I would focus on which clip worked best in shnell at 4 step

    Ponder_StibbonsJul 28, 2025· 1 reaction

    amazingbeautyBless you if you've got the patience to run flux on CPU. Or any diffusion model for that matter. But if you're serious, go with the other commenter. Or forget this stuff here and go straight to the source. All required models are in the tree. https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main

    Felldude
    Author
    Jul 29, 2025

    Ponder_Stibbons If you want base CLIP-L I would use the FP32 version over the FP16 version that Blackforest posted https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main/text_encoder

    Mr_FeiFeb 12, 2026
    CivitAI

    Could you please clarify which models these models are specifically adapted for?