First test with LTXV23.
Trying to see how LTXV23 handles feet... Not too well as it turns out π
The lora does work okay, and definitely improves feets knowledge.
Weight ~1.4
This high strength seems weird to me but per LTX own docs, apparently this is typical
Trigger is "footjob."
See preview videos for other helpful phrases.
Focused on I2V. Cock knowledge is very limited.
Works okay when feet are already in position, but otherwise... good luck.
Please don't ask for tech support. I have no idea what I'm even doing. I'm using the 2-stage distilled workflow from LTX, that's all I know.
Do feel free to leave comments about how shitty this is so I can try to target improvements.
technobullshit:
Trained to 1000 steps. Beyond that things got weird fast, with hands turning into feet and all kind of horror shit.
The training clips were:
five close-ups (just feet and cock) 512x512, 24fps, 65 frames.
four full-body 448x576, 24fps, 89 frames.
I found that 'context' training data (the full-body clips) were extremely important. This is different than I'm used to with Wan which seems to tolerate less context.
I plan to continue working on this, but this seems workable enough for a first version.
Description
FAQ
Comments (2)
Wow, the lora is so small! Do you think a larger lora could improve the foot understanding?
This appears to be the typical size for a rank 16, video-only, bf16 lora for LTX.
targeting video, audio, and cross-modal a/v modules resulted in ~200mb, but since my training data had no relevant audio, I targeted only video weights and that's what reduced it to ~100mb.
So to answer your question.... maybe?
going rank 32 would certainly increase file size, but would it improve the understanding? or just increase the capacity to implement what it learned from the training data? I dunno. I think the bigger lever would be a lot of well-curated feets training data, but that wouldn't affect the lora file size.
I really think the biggest problem is LTX having such rudimentary understanding of feet. Not sure this is something that can be corrected with a lora trained on a lil 5090. At least not outside specific contexts like 'footjob'