I found 2 WF that i liked and decide to put them together:
I dont know what settings is right to use (cfg/steps/ lora strength). But it seems to be working as it is for now.
I use Sage Attention - LINK
UPDATE COMFYUI BEFORE USE
==========
v.1.0 (WAN 2.2 FLF2V)
WAN 2.2 has FLF2V (First-Last Frame to Video) native capabilities. So i tried to adjust my workflow to make it work and seems it does. Hope you like it.
Enjoy.
==========
v.1.0 (WAN 2.2 I2V)
I just rearange nodes so it works with WAN 2.2 GGUF
https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF/tree/main
Notice that for 14B model you need both models - HIGH noise and LOW noise
==========
v.1.1 (WAN 2.1 I2V)
Instead using CausVid lora i used FusionX lora which is already have CausVid
I set lora streinght 0.4 in first Ksampler (HIGH CFG START) and 0.8 in second one (LOW CFG END)
In version 1.0 you can just change loras in Dual samplers group
==========
v.1.0 (WAN 2.1 I2V)
1 - CausVid 2 Sampler Workflow for Wan 480p/720p I2V (Main part)
I Used this lora: Wan21_CausVid_14B_T2V_lora_rank32.safetensors
2 - WAN 2.1 IMAGE to VIDEO with Caption and Postprocessing (Florence Caption, last frame, Color match)
==========
I saw suggestions that dpmpp_2m - normal works good.
There is dpmpp_2m - simple in this WF
Description
First Frame - Last Frame Workflow Edition
FAQ
Comments (32)
Попытался сделать loop анимацию, затемняет последний кадр почему-то
Привет.
Может из-за того, что конечный кадр темнее. Можно попробовать поиграться с настройками colormatch ноды, например:
1. Использовать за основу цвета последний кадр, а не первый
2. Увеличить/уменьшить силу
Я еще тоже посмотрю, может что найду.
GFrost фишка в том что во все три импута подается одно и то же изображение, получатся логичный луп, но один кадр ломает
Получается по цветности первый и последний кадр немного разняться и происходит небольшая вспышка
Gensh Есть такая вероятность. Полюс лора немного цвет искажает сама по себе.
Gensh Я сделал пару анимаций используя одинаковый кадр в начале и в конце . Не заметил большой деградации. Проверь, у тебя так же? Если нет, то возможно ты не используешь референс для color match ноды
Florence screws this whole workflow up for me. The toggles do not even work to disable it
Hi.
1. I will add version without florence to archive soon. (dont turn it off in "Fast Group Muter" node)
2. For me "switch to own prompt" works ok. I have different results when it on and off.
Be shure that switch in "Switch to own prompt = TRUE" node in "Prompting" group set to "true" and you put your prompt in "Your Own Prompt" node
I tried to install different autoprompter but it fails to import (LLM). Im still in search if there is something better and cmpatible.
GFrost Thank you for your effort
Thanks for this I2V workflow - best out there!
One question: Any advice on keeping the face consistent? It seems to generate a different face from the source image.
Hi, thnx.
Yeah it may do changes in face. It seems lora's cause it. Try to play with loras strength
GFrost Thanks for the quick reply! I'll play around with it.
Throw in a reActor node. My FLF2v WF has a restore node. Take a look at it if you're not familiar with reActor. It can be fiddly but your only other choice is controlnet. I'll be updating my WF to 2.2 soon as well. Hope that helps.
@Ponder_Stibbons Thanks! I'll give it a shot
I find this workflow pretty slow next to t2v or i2v. Like 5 times slower. Is it me or it is normal. I am using sage attention. Also my comfyui exited roughly ("killed") at the color match part. Though I can already see the effect...
Try different steps/resolution/model
meybe overall 20-15 steps and end on 5 step.
Also you can use different lora instead FusionX.
i use Q5_K_M model for now and it takes nearly 20 min to render 5 sec on my 3080 ti (20 steps end step on 7)
when i used Q4_K_M it took around 14 min (20 steps end step on 5)
I also use Sage. I want less speedups because they lower the quality. Like Teacache make hands worse.
btw is there a picture2text part in your workflow, like i need an openai key or ollama ? If yes, would you mind to elaborate a little bit what I need? Edit: I remember, i have a problem with the node.
M14w
* There is "florence caption" Group (u can see it on screenshot). I did almost all my generations with it help. It does not need any key it works localy.
* U can change how detailed you wana caption to be by changing "task" in "Florence2Run" node. Models for caption downloading automaticly. But u only need to use "Florence-2-large" one. it is fine.
* U can add text before or after generated caption by inputing text in "Pre text" and "after text" nodes in "prompting" group
* Or u can use your own prompt by switching "switch to own prompt" node to TRUE and input text in "Your own prompt" node. Or copy generated text by florence and paste it in "Your own prompt" and slightly change it.
This example of generatet caption from one of my videos (Runing Big guy):
The video shows a man in a brown outfit standing in the woods with his hands on his hips, surrounded by trees and plants, with a full moon in the background.
GFrost I am editing your workflow because I have an incompatibility of nodes. I am trying to use joy caption (though i am not done). What are the specification that you recommend for the caption(s) ?
M14w i used those which was in original Workflow. I created mine by merging 2 that i found. So i didn't change anything in caption and didnt research it.
Is Joy caption better?
GFrost For the moment, i am trying to solve some out of memory errors, they seem to be related to the long prompt
Also there is this " Мужчина переминается с ноги на ногу с удивлением смотря на зрителя. ", it is probably better to have it in english because most users wont understand the issue if they keep it.
M14w It is just part of the prompt in my native language. I didnt think somone will use my prompt to generate their videos. But noted, next time for demomstration porpuses i will keep english language.
- I was able to solve my initial problem by using a Q3_K_M quantization: it went flawless, up to the end. It was a ram or vram issue. Note that i check the option "use other vram".
- I was able to replace this "florence caption", by joy caption. The replacement of the node is easy. However pay attention to the download time and the space on your hard drive for this model.
- Typically the default memory mode of joy caption is not good for me (3090). The only one that worked was the Maximum saving 4 bits, the 8 bits mode gave another error.
This behavior suggests that the caption model stays loaded in the memory after the captioning.
The length of the caption doesnt seem to matter at first, but can create problem after the second "Ksampler low cfg" part: then comfyui gets "killed", sign of a memory problem. It is probably wise to play with the max number of tokens to prevent a problem.
- If you see a man dancing, it is because there is this "Мужчина переминается с ноги на ногу с удивлением смотря на зрителя. " somewhere. And since we are speaking about a "man", I suspect that It can change the face of the character (if you try to render a woman).
M14w did you try to put "clean VRAM used" node after caption group to offload caption model?
GFrost I am not sure it helps (edited)
GFrost Also I get decent result by changing the method to LCM, cfg=1. The number of total steps will change how blurry will be the transitions. In any case, it saves time
Hello, i made an updated workflow of yours. I kept the credit inside of it. I still need to update the description page and give some other info but the workflow is there.
M14w Great, happy that my WF was usefull.
This is a great workflow, I am using Q8 GGUF with a RTX 5090 and it's fairly fast, I just wish it could upscale too (hint hint) :)
Thanks
Also, is there a way to change from Landscape to Portrait?
Glad you like it =)
AttributeError: 'NoneType' object has no attribute 'get_model_object'
