NO SUPPORT WILL BE PROVIDED. THERE ARE NOTES IN THE WORKFLOW THAT EXPLAIN EVERYTHING. SEE INSTALLATION INSTRUCTIONS BELOW:
This workflow uses:
CogVideoX-Fun 2B/5B - Uses text and an input image to create a video.
Florence2 - To create a caption. You can provide your own caption, use Florence2 for the caption, or combine your caption with Florence2 caption.
DepthFlow - Create an end image for your video. You can provide your own end image or have DepthFlow create the end image for you.
NOTE: THIS WORKFLOW USES THE LEGACY VERSION OF https://github.com/kijai/ComfyUI-CogVideoXWrapper/tree/1.0_legacy
YOU WILL NEED TO RUN: git checkout origin/1.0_legacy to get the legacy version of the CogVideoXWrapper.
Installation Instructions
Download and install ComfyUI. https://github.com/comfyanonymous/ComfyUI
Download and install ComfyUI manager. https://github.com/ltdrdata/ComfyUI-Manager
Run the ComfyUI server from your terminal/command prompt.
Open ComfyUI in your browser.
Download this workflow.
Drag and drop the JSON file that was download into your browser window.
You will get an error message saying that some nodes are missing.
Open the ComfyUI manager.
Click on Install Missing Custom Nodes.
Wait for the installation to finish.
Open ComfyUI\custom_nodes\ComfyUI-Manager\config.ini and update: bypass_ssl = True
Open ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper in your terminal/command prompt.
Run: git checkout origin/1.0_legacy
A popup will show that you need to restart the ComfyUI server. Restart it with the button.
When the server has restarted, refresh the page.
Delete the workflow.
Drag and drop the JSON file that was download into your browser window.
Run the workflow by pressing Queue.
You will get a popup error saying that the T5 clip is missing. Download it by following the instructions in the Note node next to the Load Clip node.
The CogVideoX-Fun model will be downloaded automatically.
READ THE NOTES FOR EACH CORRESPONDING NODE FOR HELP.
Add an Input image.
Update the prompt.
Run the workflow by pressing Queue.
Wait for the video to be generated. This should take around 100-400 seconds depending on your GPU.
On a 4090, it takes 100 seconds to generate a video.
Description
Moved the input nodes closer together.
FAQ
Comments (11)
Any plans for the latest version of cogvideo nodes?
I tried the latest update but there's some issues:
1. The lower 1/6 of the video has distortions. Even when using CogVideoX-Fun, not all resolutions are supported.
2. The last 8-10 frames have incorrect brightness and contrast.
I'm waiting until the issues are fixed. Then I'll update the workflow to use the latest changes.
@screamlouder Ah, ty for the info-- Good luck with your endeavors--
What did you prompt to get the example movements above?
The workflow looks fantastic! Unfortunately even after reverting back to the legacy version of the ComfyUI-CogVideoXWrapper, there is still an error with workflowPrompts - CogVideo TextEncode
I really would love to use this!
That is just a group node made with comfyui that combines two cogvideo text encode nodes.
If you look at the i2v workflow here: https://github.com/kijai/ComfyUI-CogVideoXWrapper/blob/main/examples/cogvideox_1_5_5b_I2V_01.json
All you need to do is create two cogvideo text encode nodes.
The first will be your positive prompt.
The second is the negative prompt.
Then just link up the positive to the positive link in cogvideo sampler. Negative to negative.
@screamlouder I managed to replace the node with a working version of CogVideo TexEncode and that resolved that issue.
For those running a 4090, you may want to lower:
Width Simple Math int to 448
Height - Simple math int 720
Or it tends to freeze on CogVideo Decode (Likely running out of memory)
--
I find when using an end frame, the blending in the middle of the animation is a bit rough. Often it abruptly fades into the end frame, or it struggles to move the character into position.
Example 1
Example 2
Any advice on how to improve this?
Hey, I really appreciate you sharing this awesome project of yours with everyone. I dont quite understand the DepthFlow Group. Is it just generating an end-image base of my first image and thats it? Or does it further impact the video generation.
It only generates an end image. No impact on the video generation.
The reason I added it is because I don't usually have a good AI image for both the start and the end of the animation. So I let the depthflow group make the last frame for me.
@screamlouder I see. In your experience. What exactly is the end frame doing? Does it have to be slighty different or is the goal to give it my destired outcome and pray that it generates it?
@AbsoluteBussin It just has to be slightly different so that there's a "target" for the animation to move towards.
If you provide the same image to the starting and ending frame then it's unlikely for the animation to have any movement. But if you use the depthflow group to make a slight change in the starting frame by moving the "camera" around then it causes the animation to have movement.
I suggest you forget about cogvideox and move onto Hunyuan i2v with leapfusion: https://civitai.com/models/1180764/hunyuan-img2vid-leapfusion-lora?modelVersionId=1328798
If you look at the gallery, you can see my latest gen and how much better it is compared to cogvideox.