Pony FinalCut
Note this model is intended to be run at far higher resolutions then currently offered onsite. Consider petitioning @theally or @chipshajen or @Cosmic_Crafter (Or @Justin if your feeling lucky) to add support for such - Maybe they could "steal" from TensorArt for once.
Or maybe ask they didn't make high rez images look like garbage without clicking them three times. The perfectly sharp and single pixel hair at 3k has been thousands of hours of work.
832x1216 & 1280x1536 native generation size even 1280x1792 is possible for highly trained subjects
The best quality will come from forcing unet in BF16 & VAE/CLIP in FP32 use this tool to launch ComfyUI
Forge/Auto1111: set COMMANDLINE_ARGS= --unet-in-bf16 --vae-in-fp32 --clip-in-fp32 --cuda-malloc
Comfy: cmd /k python main.py --fp32-text-enc --fp32-vae --bf16-unet --use-flash-attention
This model took over 120KwH of energy to train. Not including the 50KwH used to train the CLIP models.
Note: Auto1111/Forge has an issue with stored BF16 VAE in V1.1 BF16 if not using the FP32 VAE command
Description
FAQ
Comments (34)
Well, god damn. What is this Joy thingy? ANd this is really good. When the first image popped up I was a bit taken aback by the detail. It really makes you see how low res a lot of image generations are when you compare them to much higher resolutions. The nice thing about this that it removes the upscaling step in a lot of cases and if you use it with ClownKSamplers then you're going to get some amazing results.
Very good work and thank you!
Thank you.
Joy would be the JoyCLIP models that I just finished training, I have a few articles showing the improvements they made across multiple pony models and mine showed some improvement also.
The KL optimal and Resmulti step have been my go too, but clown looks interesting can't have to many choices
Great model! Is there any specific guidance on prompting?
score_9 and the like are generally not needed, but all other pony prompts characters etc work. Use of negatives should be limited to removing something from an image you do not want, such as a table with food, and you don't want food showing up
@Felldude Thanks for the reply! Should I just rely on booru tags or write more natural prompts? This CLIP is insane btw. Note sure if it’s intended but both SDXL and Pony LoRAs work really well with this CLIP.
@nickname45 It is as it was a realignment with OpenCLIP, but very carefully to not loose the WD14/Booru tags - It still favors pony style of prompting but you can use long natural language it will just pick and choose what to adhere too
@Felldude That’s amazing. I don’t really fully understand how SD works but if you work the same magic on the CLIP for illus to make use of its prompt adherence, that might be peak SDXL lol
@nickname45 illustrious did some crazy things with the clip.
It should be named "Final Cat".
Lol, well if I think of some owl memes
120KwH of energy to train :O thanks for share, I've just tested looks like the best realistic pony model so far
Thanks, when LAION trained CLIP they used 1-2 MwH per day : )
You're the GOAT.
I normally don't comment to give praise, but your work is likely going to become a core of the industry standard once it gets stolen and neutered. Everybody else is re-frying beans and you're channeling Mendel.
This is some Promethean craft; you should be at Disney or Sony.
Au revoir
Thank you, and I worked for a large corporation long enough lol
WOW! This model seems to be ideal. Good work, keep it up!
Thank you
All images or undercooked or overburned. Resembling early SDXL "realistic" attempts.
(Only by posted images here).
At lower then intended latent size many images would but many users can’t do 1024x1536 or 1280x1536 starting latent size
I have spent hours on a workflow dedicated to segmenting and detailing specific features for PONY model outputs so that they survive upscaling with satisfactory results.
I am blown away by the result hybridjoy produces in roughly the same amount of compute time and probably an order of magnitude fewer nodes.
Thank you for this!
Thank you, I guess with so many using small screen devices they can’t perceive the difference
Felldude not only that, but the model works with a vanilla load of comfy. Loading with the higher precision files and taking advantage of the higher native generation resolution is where the magic really happens. I would imagine a lot of people aren't leveraging those things.
really great model. im torn between this model and all the other models i've used. it handles my complex workflow like it is nothing. Great stuff! (surely i dont get thumbs down this time, those who know lol).
Some of those thumbs down disappeared lol - Thanks and I did merge in my clip training with other popular models at https://huggingface.co/Felldude/LongCLIP_Model_Merges/tree/main
I still need to do the universal clip merges for PONY, the UNET is also trained with the same data on my models so they would still act a bit different
For some strange reason, i get overburned images using the clip from your models say finalcutpony for example. However, when i use your noMERGEUniversalCLIPFLUX_illustriousBaseCLIPG_2.safetensors along with noMERGEUniversalCLIPFLUX_illustriousBaseCLIPL.safetensors, I get great results, it listens to prompt and no burns whatsoever. Strange right? im pretty sure its just because of my weird workflow tho. Anyways, im looking forward to see your other releases!
Mrpopo714 In my testing universal did visually preform better then JOY - Illustrious is an interesting clip for sure
Felldude Yep even in anime style models it performs well. It truly is a hidden gem. no glazing intended lol
Man, I just found this model 2 days ago and I was getting incredible results from it, now it's locked 😔 hope it comes back around eventually, the results were unmatched.
Thank you, given the use which generated 2k and the cost which was 10k part of which was donated; I can not support the model being on site at a 80% loss.
Granted given I was model 186 of 200 the model did see substantial use which I appreciate.
@Felldude totally understandable. Hopefully it gets bid back again soon. Can't wait to try it again.
This checkpoint seems amazing but their are a couple questions that need to be clarified regarding it.
Does FinalCut (Pony) FP32_v1.1 has its VAE baked in or do we need to use a specific VAE (SDXL VAE, something else) when using Forge UI or any A1111 based UI? Also what clip skip value to use with this checkpoint?
Standard FP32 SDXL VAE is baked into all my checkpoints - the default clip skip is fine but you certainly can use -2
@Felldude Thanks for the clarification, much appreciated.
Is there a recommended CFG scale, denoising strength, scheduler, sampler, upscaler and upscale factor (x1,5, x2, etc...)?
I m asking because, sure, i can look at your showcased pics but better safe than sorry and most of those infos aren't available on showcased pics anyway if using a tool that isn't ComfyUI.
@antarek euler tends to be simple but accurate, dppm2 tend to be the go to for a all around, I use kl_opimal and res_multistep for the image to image at .5-0.7


















