This workflow is my take on organizing and building an I2V model running against a "smaller" graphics card (specifically 12 GB). I use Q5 GGUF models, though I have also tested against smaller Q4 and Q3 - each having a slight decrease to prompt adherence and quality, but still usable. This includes using the lightspeed lora and notes on where all models / loras can be downloaded from.
I suggest if you have limited RAM or VRAM, to run ComfyUI with the parameter: --cache-none
While this will mean multiple batches against the same video will be slower, you get a much more consistent overall generation speed of your videos (3-4 minutes for 5-6 seconds on moderate home PC configurations).
This also includes using Florence2 (LLM) for image detection and auto-prompt assistance. You only need to add your action to the manual prompt (if desired).
There are a lot of nodes I have seen in various workflows ... but in Wan 2.2 I2V, at least, they tend to have no effect and only increase overhead.
I run my videos typically in 480p @ 480 x 832, and this workflow then upscales by 2x to 960 x 1664.
Custom Nodes Used:
ComfyUI-GGUF (https://github.com/city96/ComfyUI-GGUF)
rgthree-comfy (https://github.com/rgthree/rgthree-comfy)
ComfyUI-KJNodes (https://github.com/kijai/ComfyUI-KJNodes)
ComfyUI-Florence2 (https://github.com/kijai/ComfyUI-Florence2)
ComfyUI-VideoHelperSuite (https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite)
WAS Node Suite (https://github.com/WASasquatch/was-node-suite-comfyui)
Description
Minor Updates
Updated nodes, refreshed (including Wan updates for improved coloring)
Removed obscure node for file naming
FAQ
Comments (9)
when I run this workflow all its doing is resizing the input image. what am i missing?
It starts by resizing your image to the width / height you set - and then if you have downloaded and set the GGUF model & loras (lightning + whatever else you want), and have downloaded the Florence 2 LLM model (to "read" your image) it should go through the WanImageToVideo node into the two sampler nodes (high and low), which create a final video output. The purpose of the workflow is to allow more "average" computers the ability to run Wan video creation using GGUF - and typically it should take 3-5 minutes for a 5 second clip (depending on your specs and image size you choose).
If all you are getting is a preview of the cropped image, you would have to confirm there are no other errors appearing. I just downloaded and ran the 1.3 from version of this workflow without any edits, and did not run into any issues getting to a final saved video that was upscaled by 2x (started at 480 x 720, ended at 960 x 1440) in the ComfyUI/outputs/ folder (sub-folder of i2v).
@logos011 Thanks for your response, there was no input connection into The Florence block and so it was just resizing the images and that's all. I'm very new to video and comfyui so I didn't know what to look for but was able to figure it out thanks to your explanation of the intended process. I generated my first Video and I'm uploading it to my profile.
@SomeRando013 Glad to hear that helped!
Works great, there were a few nodes not connected but managed to figure it out. Good work!
