Optimised Hunyuan/Skyreels/Wan 2.1 GGUF I2V + Upscale (Hunyuan LORA Compatible) (3060 12GBVRAM + 32gbRAM)

Optimised Hunyuan/Skyreels/Wan 2.1 GGUF I2V + Upscale (Hunyuan LORA Compatible) (3060 12GBVRAM + 32gbRAM) - Hunyuan I2V

NSFW

If you run into any problems feel free to pm me on civitai/discord

Hunyuan 720p I2V

1316.72s 73F 688x800 22steps dpmpp_2m simple

Hunyuan720pI2V Q6_K gguf (adjust as needed)
https://huggingface.co/city96/HunyuanVideo-I2V-gguf/tree/main

llava_llama3_vision
https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/blob/main/split_files/clip_vision/llava_llama3_vision.safetensors

clip_l (renamed to clip_hunyuan)
https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/tree/main/split_files/text_encoders

hunyuan_video_vae_bf16
https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/tree/main/split_files/vae

Python version 3.12.7 Cuda 12.6 Torch 2.6.0+cu126
Triton windows: https://github.com/woct0rdho/triton-windows/releases
Once you’ve downloaded the appropriate wheel file for your Python version, proceed to open your command prompt and navigate to the directory where the downloaded file is located. Then, run the following command:

Through python_embeded

python.exe -m pip install triton-3.2.0-(filename)
python.exe -m pip install sageattention==1.0.6

------------------------------------------------------------------------------------------------------
Wan2.1

562.51s 512x512 uni_pc simple 33F
12step & 8step split works as intended
81F 1018.89s!
81F 573.99s!
8step Split 161F/10s (16fps) 512x512 uni_pc simple 6760.70seconds but it works! (metadata baked png posted)

I got buzz to tip, post your creations to the workflow gallery or add the resource to your posts, Have fun!
Wan2.1 I2V update published!
49F
512x512
12step(2stage 6+6)
Uni_pc
Simple
Seems like each lora I add +200-400s inference time
33F 700-900s
49F 1000-1500s

Wan2.1 480p I2V /unet (Adjust as needed)
https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/blob/main/wan2.1-i2v-14b-480p-Q6_K.gguf

Clip vision /clip_vision
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors

Vae /vae
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors

Text encoder /clip or /text_encoders
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

(Optional) Upscale /upscale_models

https://huggingface.co/lokCX/4x-Ultrasharp/blob/main/4x-UltraSharp.pth

-----------------------------------------------------------------------------------------------------

Skyreels

Final barebones+ text weighted Hunyuan Lora compatibility update published
831.61 seconds (NO US)
932.07 seconds (NO US)
published vids in showcase
Could potentially work on 8GBVRAM or lower if you tinker with virtual_vram_gb on the UnetLoaderGGUFDisTorchMultiGPU custom node (if you have sufficient RAM GB)

Stage 1 415.369 Stage 2 315.937 VAE 70.838 total 837.93seconds. Q6+6stepLORA+SmoothLORA+DollyLORA
(I have defaulted to DPM++2M\Beta + Smooth LORA always (without for human-centric), AVG runtime: 700-900s 73F No US)

Comfyui_MultiGPU = UnetLoaderGGUFDisTorchMultiGPU (image latent batch 4 flux-finetune Q8, replace gguf loader in txt2img workflow)
Comfyui_KJNodes = TorchCompileModelHyVideo, Patch Sage Attention KJ, Patch Model Patcher Order (Add nodes>KJNodes>Experimental)

∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨
https://huggingface.co/spacepxl/skyreels-i2v-smooth-lora
∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧

Finetune the virtual_vram_gb to fit your requirements (I suggest looking at the Comfyui cmd for the distorch allocation values that show up after loading the model into SamplerCustom) or use normal Unet Loader (GGUF) with skyreels-hunyuan-I2V-Q?_

1st load
Prompt executed in 1662.22 seconds -587.365 seconds for upscale = 1075 seconds
640x864
73 frames (stable/generation time)
Steps: 6-12 (Stage 1 6 steps + Stage 2 6 steps)
cfg: 4.0
Sampler: Euler
Scheduler: Simple

(Original Kijai WF https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy/blob/main/skyreels_hunyuan_I2V_native_example_01.json)

Barebones I2V workflow with Upscaler, optimised on 306012GBVRAM + 32GBRAM
Make sure you update comfyui, torch & cuda

Run the update_comfyui.bat from the update folder

Go back to your python_embeded folder

Click on the file directory bar at the top, type cmd then hit enter

In cmd type "python.exe -m pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126"

∨∨ May ruin older workflows ∨∨

Run the other update.bat if it still aint working: update_comfyui_and_python_dependencies.bat

∧∧ May ruin older workflows ∧∧

Workflow Resources:
Fast_Hunyuan Lora (models/lora): https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hyvideo_FastVideo_LoRA-fp8.safetensors

GGUF Model (Switch the models to fit your requirements) (models/unet):

https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy/blob/main/skyreels-hunyuan-I2V-Q6_K.gguf

VAE model (models/vae): https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_vae_bf16.safetensors

Clip_l model (I renamed it to clip_hunyuan) (models/clip):

https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/blob/main/split_files/text_encoders/clip_l.safetensors

llava_llama3 model (models/clip):

https://huggingface.co/calcuis/hunyuan-gguf/blob/main/llava_llama3_fp8_scaled.safetensors

Upscale Model (models/upscale_models):

https://huggingface.co/uwg/upscaler/blob/main/ESRGAN/4x-UltraSharp.pth

Personal Generation Times

after 1st load base gen runtimes(2Stage+Vae Decode):
758.173 seconds
704.589 seconds

with suggested lora after 1st:
779.494

169F tests after 1st (No Load Test):
OOM

121F test after 1st+6stepLORA+smoothLORA (No Load Test):
1st stage
525.14s 1st iteration
729.66s 2nd
736.19s 3rd
645.15s 4th
665.55s 5th
764.12s 6th/Average
2nd stage
81.90s 1st+2nd iteration
OOM
Instant requeue after oom runs from 2nd stage
6.17s 1st Iteration
113.74s 2nd+3rd
222.92s 4th
327.62s 5th
282.29s 6th/Average
VAE 128.309s

97F tests I2V+6stepLora (posted in gallery) (no oom yet)
1123s
1013s

Description

Optimised Hunyuan 720p GGUF I2V + Upscale (3060 12GBVRAM + 32gbRAM)

FAQ

Comments (4)

dirtysemMar 28, 2025· 3 reactions

CivitAI

Most Loras Don't work. Why is that?Writes that ad blocks are not loaded.

This is the fastest project I've ever seen. It renders very quickly in very good quality,but the fact that Lora does not work is very, very bad. Can I fix this somehow?

EKKIVOKMay 1, 2025

CivitAI

hi ! amazing work ! but, I2V mean Image to video right ? so why my video is NOT the image i uploaded on my workflow ? xD

Manuel_KidneyMay 10, 2025· 1 reaction

CivitAI

Gotta give a shout-out to this workflow. After trying dozens of different ones to get WAN running on my 12gb card, this is the first one that A) generated usable results and B) didn't take five hours to run. Until I can scrape up a few grand for one of those 24gb beauties, this is a pretty good substitute. Thanks!

wapitawgJun 7, 2025

CivitAI

The first workflow that actually worked for me. How to make it better in case of 16GB of VRAM?

Workflows

Hunyuan Video

by tsolful

Download (Beta) View on CivitAI