Introducing g0llum, an LTX-2 Character LoRA.
Trained from 48 5 second clips using Ostris ai-toolkit.
Seems pretty versatile. Have some fun with it, see what you can create - please post videos to the LoRA, will be looking to tip those that catch my interest!
Description
FAQ
Comments (22)
It is significant to recall that CGI Gollum competed with CGI Dobby (Harry Potter) as WETA and ILM went against one another in the race to make the world's most convincing talking CGI character. Now we can do the same at home simply, cheaply and quickly. These are extraordinary times!
I am curious how you used Ai toolkit and got the voice to train. Lately AI toolkit has been bugged and cannot train voices with ltx2. Is your version updated recently or older than a few weeks? I installed last month and voices cannot be trained even after updating recently.
Strange, I’ve not had any issues. I updated last week too.
Yup, just validated, this was on current build.
ai-toolkit$ git status
On branch main
Your branch is up to date with 'origin/main'.
@Goon_69420 very strange because myself and others have no audio being trained. I have a 5090. I don't suppose you can share your settings you used? Maybe it is something we can change to make it work. Any changes from the default? Most of mine is the default with some minor changes where needed. I basically followed the ai toolkit official tutorial on YouTube for ltx2.
sir, do you know if its possible to train voices in other languages?
@alonsogarrote153 a quick search suggests yes, but probably not all languages. the gemma text encoder is multi-lingual
@kronos1959777 I use...
MODEL:
Model Arch: LTX-2
Name or Path: Lightricks/LTX-2
Options: no low vram or layer offloading.
QUANTIZATION:
Transformer: NONE
Text Encoder: NONE
TARGET:
Target Type: LoRA
Linear Rank: 32
TRAINING:
Batch Size: 1
Grad Accu: 1
Steps: 5000
Optimizer: AdamW8Bit
Learning Rate: 0.0001
Weight Decay: 0.0001
Timestep Type: Weighted
Timestep Bias: Balanced
Loss Type: Mean Sq Error.
Use EMA: off
Unload TE: off
Cache Text Embeddings: on
DOP: Off
BPP: Off
ADVANCED:
Do Differential Guidance: ON
Diff Guide Scale: 3
For the dataset, it's important to have the right setup, so I use 20-50 5 second clips, pre-processed to 121 frames and 24fps, and set the num frames accordingly. I Cache Latents in the Setting for the DS, and ENSURE DO AUDIO is ON.
When preparing the videos, try to prevent cutoff during words.
But basically, it's the default config, with a few tweaks.
@Goon_69420 how much ram did you have with your build? also how long did it take for the lora training.
@obinna7713 I have 128GB RAM, 96GB VRAM. It took ~11 hours (including sample gen every 250 steps) for the 5k steps. 8.22s/iter
@Goon_69420 ok that probably explains why the voice worked for you lol. I got a 5090 32GB VRAM and only 96GB RAM, the thing is, I could not start the lora training unless I did obvious changes, upside is that I got 5s per iteration ... downside is that the audio does not work. Although ... lots of people say that the audio training is broken, but you are saying you got one of the most recent updates. So comfusing. It either worked for you because you don't really need to limit your setup, or magic? But seems like you trained the same way as Ostris did in his video. And he also did not have limitations because probably used the same type of setup especially the VRAM.
@sacrificegoat154
I double checked, I’m running a slightly older commit:
```git show
commit 50664c2421b6e63dd95ab186b92c9a28e2c7cbe7```
@kronos1959777 I have had no issues with audio/voices training. I have a couple of character models here and also Donald "Shitler" Trump lora that works well.
@Goon_69420 By any chance do you know what codec you used in your dataset? AAC, FLAC?
@kuroaresjot AAC
@Goon_69420 thanks. In the end I had to wait for this new mod someone posted that makes voice training work guaranteed. So now I can finally do it in AI Toolkit. Glad the official one works for some people somehow.
@Goon_69420Goon_69420 interesting. I wonder if that matters or not. Oh well. I got it working now flawlessly with someone's mod for toolkit.
@kronos1959777 care to share with the class about that mod for the toolkit ? LOL
For those who can't run on local and are stuck with runpod, use this version: docker pull ostris/aitoolkit:0.7.19
Unfortunately cannot get the audio to voice him correctly at all using Civitai, using any slider setting etc, but the video works well as long as it is image to text with a Gollum image. Any suggestions on audio or do I need to use local workflow?
Oh wow, I just tried it and yeah, it's awful on the site. I am using the ComfyUI default workflow here at home.