This lora was trained with 56 videos extracted from youtube, the videos in the dataset do not have the best quality but it works.
you can adjust lora strength from 0.8 to 1.3.
trigger word is: b4by_d1n0
The dataset is attached in the files.
I trained using the fork I made of diffusion-pipe, it has an interface using gradio and a docker image that makes it easy to use, with just one command you have the entire environment set up and the models already downloaded, if you want to use it follow the instructions in the README of the repository: https://github.com/alisson-anjos/diffusion-pipe-ui or you can use the template for Runpod or VastAI.
https://runpod.io/console/deploy?template=t46lnd7p4b&ref=8t518hht
Description
FAQ
Comments (11)
Looks cool! How do you even train on 56 videos? I have 3090 and can't train on more than one short video, using diffusion pipe.
Your videos need to have few frames in total, the videos I used have 33 frames in total.
@alissonerdx Ah, okay, thank you!
Do your caption files contain anything other than the "b4by_d1n0" keyword? Are they all identical?
You can download my dataset attached in the files, but it doesn't just have the trigger word, it has more details.
@alissonerdx Thank you!
Can you please tell me how many steps/epochs basically ur config for this lora. Thanks.
output_dir = "/workspace/outputs/b4by_d1n0"
dataset = "/workspace/configs/b4by_d1n0/dataset_config.toml"
epochs = 30
micro_batch_size_per_gpu = 1
gradient_accumulation_steps = 8
gradient_clipping = 1
warmup_steps = 10
eval_every_n_epochs = 1
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
save_every_n_epochs = 2
checkpoint_every_n_minutes = 120
activation_checkpointing = true
partition_method = "parameters"
save_dtype = "bfloat16"
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = "single_middle"
[model]
type = "hunyuan-video"
transformer_path = "/workspace/models/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors"
vae_path = "/workspace/models/hunyuan_video_vae_fp32.safetensors"
llm_path = "/workspace/models/llava-llama-3-8b-text-encoder-tokenizer"
clip_path = "/workspace/models/clip-vit-large-patch14"
dtype = "bfloat16"
transformer_dtype = "float8"
timestep_sample_method = "logit_normal"
[adapter]
type = "lora"
rank = 32
dtype = "bfloat16"
[optimizer]
type = "adamw"
lr = 0.0002
betas = [ 0.9, 0.99,]
weight_decay = 0.01
eps = 1e-8
resolutions = [ 512,]
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2
num_ar_buckets = 7
frame_buckets = [ 24,]
[[directory]]
path = "/workspace/datasets/b4by_d1n0"
num_repeats = 10
@alissonerdx Thanks so much brother!
This is great! What would need to change to make it work for LTX Lora training?
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.