PONY CLIP 100k Finetune
JoyCLIP is a further advancement of this CLIP
Note this is still a great base to start a finetune of PONY CLIP as the gradient was 50 out. Starting with this model you will be in the 4-8 range.
100k is a full finetune of base pony CLIP-L and CLIP-G
100k can be used in any model base V6, Autism, Anime or Realistic (Even non-pony SDXL Models)
CLIP-G took 68GB and 30 hours to train.
Forge users, you will need to download Comfy UI as CLIP replacement is only supported for FLUX, this may apply to Auto 1111 also. Once a model is saved with replaced clip it can be used in Forge or Auto
Can be run in any UI (Forge, Auto1111, Comfy UI) the model will be downcast by default. These settings improve complexity but are not required. (Full FP32 is not recommended but FP32 CLIP is)
Comfy UI --fp32-text-enc OR --force-fp32
Forge/Auto1111 --clip-in-fp32 OR --all-in-fp32
Description
FAQ
Comments (15)
Sorry, I must be doing something wrong, but it generates only noise for me... Do I need to select Clip-l as vae for Pony_100k_CLIP-G model or should I download Clip-g vae somewhere?
Here is the error from console:
Loading Model: {'checkpoint_info': {'filename': '/webui_forge_cu121_torch231/webui/models/Stable-diffusion/xl/ponyCLIP100kFinetune_pony100kCLIPG.safetensors', 'hash': '8d98e610'}, 'additional_modules': ['/webui_forge_cu121_torch231/webui/models/text_encoder/clip_l.safetensors'], 'unet_storage_dtype': None}
Traceback (most recent call last):
File "/webui_forge_cu121_torch231/webui/backend/loader.py", line 274, in forge_loader
state_dicts, estimated_config = split_state_dict(sd, additional_state_dicts=additional_state_dicts)
File "/webui_forge_cu121_torch231/webui/backend/loader.py", line 240, in split_state_dict
guess = huggingface_guess.guess(sd)
File "/webui_forge_cu121_torch231/webui/repositories/huggingface_guess/huggingface_guess/__init__.py", line 7, in guess
result.unet_key_prefix = [unet_key_prefix]
AttributeError: 'NoneType' object has no attribute 'unet_key_prefix'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/webui_forge_cu121_torch231/webui/modules_forge/main_thread.py", line 30, in work
self.result = self.func(*self.args, **self.kwargs)
File "/webui_forge_cu121_torch231/webui/modules/txt2img.py", line 131, in txt2img_function
processed = processing.process_images(p)
File "/webui_forge_cu121_torch231/webui/modules/processing.py", line 836, in process_images
manage_model_and_prompt_cache(p)
File "/webui_forge_cu121_torch231/webui/modules/processing.py", line 804, in manage_model_and_prompt_cache
p.sd_model, just_reloaded = forge_model_reload()
File "/webui_forge_cu121_torch231/webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/webui_forge_cu121_torch231/webui/modules/sd_models.py", line 504, in forge_model_reload
sd_model = forge_loader(state_dict, additional_state_dicts=additional_state_dicts)
File "/webui_forge_cu121_torch231/webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/webui_forge_cu121_torch231/webui/backend/loader.py", line 276, in forge_loader
raise ValueError('Failed to recognize model type!')
ValueError: Failed to recognize model type!
Failed to recognize model type!
Dua clip loader. Not as vae.
Since i don't use ComfyUI, i gave a shoot to the CLIP-L finetune by manually replacing the keys in one of my model with success and it is very nice. 馃槉
But when i tried to do the same with CLIPG, i hit a wall and i need to understand how to adapt the key names:
In a SDXL model:
conditioner.embedders.1.model.transformer.resblocks.0.attn.in_proj_bias conditioner.embedders.1.model.transformer.resblocks.0.attn.in_proj_weight conditioner.embedders.1.model.transformer.resblocks.0.attn.out_proj.bias conditioner.embedders.1.model.transformer.resblocks.0.attn.out_proj.weight conditioner.embedders.1.model.transformer.resblocks.0.ln_1.bias conditioner.embedders.1.model.transformer.resblocks.0.ln_1.weight conditioner.embedders.1.model.transformer.resblocks.0.ln_2.bias conditioner.embedders.1.model.transformer.resblocks.0.ln_2.weight conditioner.embedders.1.model.transformer.resblocks.0.mlp.c_fc.bias conditioner.embedders.1.model.transformer.resblocks.0.mlp.c_fc.weight conditioner.embedders.1.model.transformer.resblocks.0.mlp.c_proj.bias conditioner.embedders.1.model.transformer.resblocks.0.mlp.c_proj.weightIn the CLIPG model:
text_model.encoder.layers.0.layer_norm1.bias text_model.encoder.layers.0.layer_norm1.weight text_model.encoder.layers.0.layer_norm2.bias text_model.encoder.layers.0.layer_norm2.weight text_model.encoder.layers.0.mlp.fc1.bias text_model.encoder.layers.0.mlp.fc1.weight text_model.encoder.layers.0.mlp.fc2.bias text_model.encoder.layers.0.mlp.fc2.weight text_model.encoder.layers.0.self_attn.k_proj.bias text_model.encoder.layers.0.self_attn.k_proj.weight text_model.encoder.layers.0.self_attn.out_proj.bias text_model.encoder.layers.0.self_attn.out_proj.weight text_model.encoder.layers.0.self_attn.q_proj.bias text_model.encoder.layers.0.self_attn.q_proj.weight text_model.encoder.layers.0.self_attn.v_proj.bias text_model.encoder.layers.0.self_attn.v_proj.weightIn the attention head, i need to understand how to combine the Query, Key, Value proj weights into the "In" proj weights (and figure out which mlp.fc is which).
If you have any pointers it would be great, otherwise, i'll just check a bit more the CLIP code from SDXL 馃槈
Well, i figured it out thanks to the diffusers convert code: https://github.com/huggingface/diffusers/blob/c934720629837257b15fd84d27e8eddaa52b76e6/scripts/convert_diffusers_to_original_stable_diffusion.py#L233
Well, i managed to merge it in one of my own models and even in FP16, it looks marvelous. Now, if i add the source of the Clip in my model description, would i be allowed to release this version or would you prefer i keep it to myself?
@n_Arno聽All CLIP models subject to the original license and required to be open source. So your welcome to do what you wish with it without attribution, thanks for asking though
hmm using the L model for flux is improving the output notably, however you have to use the right textencoder like T5XXL - looks better than the pro model, interesting
I have not tested 100k CLIP-L with T5xxl or FlanT5xxl - was it a NSFW finetune that showed improvement or base
@Felldude聽interesting question, normally i use jibmix but it holds true for dev as well, seems to affect models in general, aside of improving general picture quality overall it gets much more details right that are usually dropped in longer prompts
@androsynth7610聽Very interesting findings, I would not have expected any improvement on Flux given it was PONY based and then realigned to VIT CLIP - Thanks for posting
@Felldude聽fanntastic work
I tested the CLIP-L in FLUX, it appears to be static with the 100k CLIP for PONY, It's possible that it did not load correctly for you
no its not a loading mistake but i guess the workflows are as different as they come
I'm a bit confused, sorry. Is this supposed to be used as CLIP only? As in, I can you CyberRealism Pony for the model and this as the CLIP driving it?
Yes clip-g and clip-l are the two clips in sdxl or pony
