V2!
V1 has some issues with blocky images and i removed low res images from training set which made end result slightly blurry. And also i trained with lower learning rate of 2e-4 for 100 epochs.
NOTE: there was a bug in the training script. So for now use this code with the specific commit. but i will update with v3 soon.
Use "Video of a transgender woman" at the beginning of the prompt to trigger it.
git clone https://github.com/kohya-ss/musubi-tuner.git
cd musubi-tuner
git checkout fd70762
pip install -r /local_disk0/musubi-tuner/requirements.txtpython ./musubi-tuner/hv_generate_video.py --fp8 --video_size 1280 720 --video_length 120 --infer_steps 30 --prompt "Video of a transgender woman with fair skin and long, straight white hair, styled with white cat ears. She is dressed in a revealing, white lingerie set, featuring a frilly, off-shoulder crop top that exposes her midriff and a matching ruffled mini skirt. She is also wearing white fishnet stockings that reach just below her knees. Her makeup is bold, with dark eyeliner, mascara, and pink lipstick, complementing her cat-themed costume. She has several tattoos visible on her arms, including a script tattoo on her left arm and a circular tattoo on her right forearm. Her miniskirt is lifted to reveal her erect penis. The background is dimly lit with a purple hue. The setting appears to be indoors, likely a bedroom or a private space, with some indistinct furniture and decor visible. The overall atmosphere of the image is playful and provocative, enhanced by the cat ears and lingerie. The woman's pose is confident and slightly provocative, with one leg raised, adding to the overall seductive tone." --save_path "./videos/" --output_type video --dit ./hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt --attn_mode sdpa --vae ./hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 --text_encoder1 ./split_files/text_encoders/llava_llama3_fp16.safetensors --text_encoder2 ./split_files/text_encoders/clip_l.safetensors --seed 69 --lora_multiplier 0.8 --lora_weight ./lora.safetensorsSee this for more info https://github.com/kohya-ss/musubi-tuner?tab=readme-ov-file#inference, the repo also has a converter to convert to diffusion pipe/comfyui format
Description
FAQ
Comments (3)
Getting lora key not loaded in comfy on this one when trying hunyuan fastvideo fp8. Can you confirm what model it is known to work with?
Yeah it's done with the normal fp8. Not the fast one. A
all this code crap is hella lame jfyi just post the prompt nobody cares about pytorch commands