Inference
For inference I used ComfyUI.
I uploaded an example workflow here: https://github.com/cseti007/ComfyUI-Workflows/blob/main/wan2_2-lightning1_1-gguf-nfjinx.json
Using the example workflow you can recreate this video: https://civarchive.com/images/93274242
The strength of the LoRA can differ from prompt to prompt. As best practice, I suggest always checking the high model inference and adjusting the high noise LoRA strength or the steps accordingly. Mostly it is optimal when the character features are just beggining to appear in the high model inference, but aren't prominent yet.
Trigger words: Nfj1nx, blue hair
Strength: 0.6-1.2
You can find example prompts in the video examples!
Trainig details
Trained only on videos.
HIGH noise LoRA
dataset: 30 videos 480x270 25,33,65,81 frame videos
steps: 2130
LR: 5e-5
optimizer: AdamW Optimi
rank: 32
batch size: 1
gradient accumulation steps: 1
min_t = 0.875
max_t = 1
LOW noise LoRA
dataset: 42 videos 640x360 25,33,65 frame videos
steps: 2730
LR: 5e-5
optimizer: AdamW Optimi
rank: 32
batch size: 1
gradient accumulation steps: 1
min_t = 0
max_t = 0.875
For training I used the diffusion-pipe repo.
Important Notes: This LoRA is created as part of a fan project for research purposes only and is not intended for commercial use. It is based on the movies, which are protected by copyright. Users utilize the model at their own risk. Users are obligated to comply with copyright laws and applicable regulations. The model has been developed for non-commercial purposes, and it is not my intention to infringe on any copyright. I assume no responsibility for any damages or legal consequences arising from the use of the model.
Description
FAQ
Comments (3)
This is an extremely impressive LoRA—it looks so good! thank you so much for this amazing work.I noticed the video resolution in your showcase is 640x360, which is quite a large size. How much VRAM does your graphics card have? Additionally, could you share how you labeled the data in your training dataset?
Hi, thanks! I trained them on an rtx 4090 using diffusion pipe. You can offload some load to cpu ram so it will use less vram but the training will be obviously slower. I have some video and image captioner repos on my github page what I use. I recommend trying out my newest which is using qwen 2.5 omni 7B AWQ: https://github.com/cseti007/qwen2.5-omni-captioning/tree/main
To caption the dataset for this particular lora, I used the older qwen 2.5-vl based repo: https://github.com/cseti007/Qwen2.5-VL-Video-Captioning
But, I feel like the quality of the omni model is much better.
@Cseti That's fantastic! Thank you so much for your answer—it's really helpful.