Generates WAN 2.1 videos in a fraction of time.
720p and 480p Version
Recommended settings:
Sampler/Scheduler: Euler/Simple
Steps: 4
CFG: 1
Sigma-Shift 5
Original Model from Lightx2v converted to FP8 quantisation.
☠️ Do not use any extra speed-up tricks or LoRAs or it may mess up your generations ... 🤬
⚠️ Hint: Most of the time the model is taking you by word. If you write "white" it is white. "Translucent" is translucent... like for the fluids. 💦 Now you know! 🫵 translucent whitish 🤫
⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️
Recommended specs:
8 GB VRAM, 32 GB RAM
Sample times: <2 minutes for 81 frames, 4 steps on RTX 4070 Ti Super.
Compatible with 14B LoRAs.
I normally use 0-2 LoRAs, strength at 0.4-1 depending on how much the effect should be. 0.7-0.9 works best most of the time, not overwriting the style of an image.
At multiple LoRAs is seems best to tune the strength a bit down to 0.3-0.6.
Basic workflow example:
Here: https://civarchive.com/models/1811161?modelVersionId=2049602
My favourite UI:
SwarmUI https://github.com/mcmonkeyprojects/SwarmUI
Testing (my specs):
I can go wild on setting with this full checkpoint, even with added LoRAs:
121 frames possible: ~ 3 minutes
121 frames on 24 fps possible (more motion): ~ 3 minutes
128 frames on 24 fps possible (more motion and extended): ~ 3.5 minutes
Dependencies:
YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.
Disclaimer
This models are shared without warranties and with the condition that it is used in a lawful and responsible way. I do not support or take responsibility for illegal, harmful, or harassing uses. By downloading or using it, you accept that you are solely responsible for how it is used.
Description
Just the clip_vision_h file for Wan 2.1 FP8
FAQ
Comments (56)
Dude! Fantastic! Faces don't change at all with this either. The next rounds on me!
Never swapped the default Clip vision H before so I feel a bit ignorant - does this work on 720p model as well or works on any Wan versions?
I think you so not have to switch the clip Vision If you have one. It's just here for completion and if you have none.
darksidewalker What's the "Lightspeed" version compared to the regular Vision H version?
thank so soomuch, this is awesome
YOU ARE MY HERO BRO
I'll get your first born (video), right? ... RIGHT?! 😈 ... 😆
Can we reduce film VFI node Execution time?It`s need 63 second in 4090
You can not. It has nothing to do with the model. It is it's own model. Just do not use it if it bothers you.
darksidewalker Please don't take this the wrong way. I just curious the film vfi node I set it`s wrong , I think it can reduce a bit the execution time if I change the right parameter or add some node.
sekaiwlc07860 I fear I can not help here. SwarmUI, what I use, handels this automatically if set up.
it never finishes loading the model
What does that even mean?
darksidewalker im assuming he cant mount the model, probably running out of Vram
Wow! This is amazing fast and the quality is great!
What is the point of using this instead of the light2x lora? I checked them against eachother and there seems to be no difference.
"The point is that Wan 2.1 Lightspeed is a pre-optimized model with FP8 quantization and built-in attention mechanisms, offering faster video generation and lower hardware requirements without needing LoRA setup, while maintaining similar quality to Lightx2v LoRA with less complexity and consistent style."
This answer is not mine, it is the response I received from Grok 3 when I asked him your question. I'm not an expert, so I don't know if this answer is accurate or not.
grrrr It makes no sense! "without needing lora setup", yeah.. same description would be better applied to the lora "without needing 15gb file and clips"..
and what does it mean "with less complexity and consistent style"? is less complexity good or bad? is the style less consistent or are we meant to interpret it as meaning more consistent?
sibdwqwawmndbdolrf674 I discovered this model while searching for ways to generate Wan videos faster. After creating my first video, I learned about other options like the FusionX model, the LightX2v you mentioned, and CausVid. As you pointed out, adding a single LoRA to an existing workflow isn't complicated, so I can understand why you'd question the point of this model's existence.
But consider another perspective. As I mentioned, I initially found this model when looking for faster ways to generate Wan videos. I'm not a native English speaker, and while I use translation tools, acquiring technical knowledge isn't as easy for me as it would be for native speakers. So for people like me who are first encountering this technology, something more accessible and straightforward is needed. I actually found this model helpful when I used it, and that alone is sufficient justification for its existence. You might feel this model is redundant and unnecessary, but that could be because you're already an experienced user.
The goal is to create a single, user-friendly checkpoint with all features integrated. I find it to be more stable and capable of rendering more frames, as using a LoRA is slightly less efficient than a full checkpoint. You’re not required to use the checkpoint though.
Great. By which I mean really really good.
Now do T2V and FLFV ! By which I mean pretty please !
Thx!
But I do not plan to do a T2V and extra FLFV on this.
darksidewalker the brightest day in my life .. followed by the saddest one. Well, let's hope somebody else will. I mean who needs T2V, but FLFV would be really nice. Thanks anyway !
TheodorSid Added the T2V
darksidewalker And you did it anyway! Outstanding job on this
This was a very good model and workflow, everything ready made for fast video production. It was my first contact with Lightx2v, I took some ideas from this and got to adapt it to Wan 2.2 with the separate loras FastWan + Lightx2v and it worked flawlessly as well. It would be nice to see something similar for Wan 2.2 for convenience.
This.. Is.. INSANE!
I cant believe it can literally animate any image. This is nuts. The gooners will never leave their basements now. Im buying stock in hand lotion and toilet paper companies right now.
I am getting black screen video generated ;/ any ideas why its like this?
same. Swarmui, can'Ät get it to work :-/
If you have a sageattention node in your workflow, turn it off, and if it's in the startup options of comfyui, try deleting it.
@DoroArmy Thanks for the tip. That was my fault. After turning off the sageattention the workflow runs.
HELP! is it only for 5090? tried basic workflow and got out of vram issue
Works fine on my 4070 Ti.
save the checkpoint into diffusion models, not stable checkpoints, change load checkpoint node to load diffusion model, set dtype fp8_e4fmn(or whatever that is, the one with 4 not 5) and it should work.
fp8_e4m3fn
I don't know what black magic you've performed to create this, but this is incredible. Great job!
Hoping to see a Wan 2.2 version of this.
On my setup (RTX 3090), generating a 5 second clip at 512x512, same seed and prompt:
- Wan 2.1 (no optimizations) = approx 6 minutes
- Wan 2.1 (TeaCache) = approx 4-5 minutes
- Wan 2.1 (Lightspeed) = approx 2-3 minutes
Worked perfectly with my existing Wan 2.1 Loras and workflow (just had to disable TeaCache).
Thank you, but the major work goes to lightx2v team. I just made a single checkpoint quant for convenience 😊
darksidewalker You already helped to save us a big amount of effort & time :) , I have a machine armed with RTX 4080 as main GPU & also RTX 3090 as second GPU fed by a separate 1000W PSU, your custom LightSpeed model was what I need , But I Only downloaded the 480p version. the 720p version looks tempting but honstly I got tired of the never ending of downloads for everything (checkpoints , Loras , ...etc) so I wanted to just choose & GO.
hazzoom82659 Thank you! Glad that the checkpoint helped. The 720p version just dropped and my generations look promising so far :)
Will you be converting this for FP8?
https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v
It would be awesome!
I'm not into T2V. 😔
darksidewalker would you be willing to share your process to do this single checkpoint conversion with all speedup tools included so I can try to do the same for T2V?
Griphen116 Added the T2V
Awesome, you da man
Edit: see comment, but pretty sure this was just due to making the AI zoom too much on non-optimal init images.
Well the speed is incredible, but I'm doing something dumb I guess. Even with no LORA, settings as recommended as best I can tell, your clip files, I'm getting terrible distortions of faces and really anything that moves in frame. Didn't have the same issue with causvid, bare model, etc. Any ideas what I can try to tweak?
Tested another image used in a previous run with "regular" Wan 2.1 I2V 480p + CausVid and it worked fine with this model. Hmm. Same image has rendered with no changes with other models though. Is there anything particular about image resolution, dpi, etc. I should be aware of?
Tried a bunch of different images, almost all have the same thing. Checked every setting, but I'm VERY new to this, so I'm probably just missing something. I'm using SwarmUI - should I be using a custom workflow (like the one you have here)? As far as I know everything's just default other than the settings I've changed as you recommended. Thanks!
whatsthisaithing Did you make sure not to apply any extra "tweak/speedup" loras and speedup tricks like teacache? They probably will mess up your video. I do not use any custom workflow, just the SwarmUI settings I recommend. Maybe your ComfyUI backend is messed up with tweaks? I use a really clean comfyui backend and only the mentioned settings.
darksidewalker Well I figured it out (I think). I was using init images that were zoomed way too far out from the subject(s), thus making it infer the greater and greater detail as the camera zoomed in. Combine with less than tack sharp photos and more complex camera and actor movements and you get face mush, changing hair styles, etc. Did confirm almost identical issues when using your other GGUF + CausVid, and even the regular fp8 safetensors file + CausVid had issues if maybe SLIGHTLY fewer.
Also: REALLY need to listen to your recommendations on LORA strength. :D
I was just asking too much of our AI overlords. Thanks for the great work man!
Do you have a version that's a little smaller? I only have 16GB RAM and it's not enough, I tried.
This absolutely works in 12GB cards.
It even runs on 8GB cards. I think your setup or backend may have some flaw if it OOM's
just tested on 16GB vram, works perfectly. 2:45-2:55 seconds 16fps 81 frames. i just used the sample workflow. switched load checkpoint to load diffusion model and it loaded.
Oh, sorry. My RAM is 16GB but my VRAM is only 8GB.
newsatrandom22785 16GB RAM is a bit low for video gen no matter the checkpoint. 32 GB are recommended, you could get it to work if you lower resolution or frame count.
newsatrandom22785 16GB of ram has been obsolete for at least 8 years now. 32Gb is the minimum recommended ram for any computer.