Generates WAN 2.1 videos in a fraction of time.
720p and 480p Version
Recommended settings:
Sampler/Scheduler: Euler/Simple
Steps: 4
CFG: 1
Sigma-Shift 5
Original Model from Lightx2v converted to FP8 quantisation.
☠️ Do not use any extra speed-up tricks or LoRAs or it may mess up your generations ... 🤬
⚠️ Hint: Most of the time the model is taking you by word. If you write "white" it is white. "Translucent" is translucent... like for the fluids. 💦 Now you know! 🫵 translucent whitish 🤫
⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️⬇️
Recommended specs:
8 GB VRAM, 32 GB RAM
Sample times: <2 minutes for 81 frames, 4 steps on RTX 4070 Ti Super.
Compatible with 14B LoRAs.
I normally use 0-2 LoRAs, strength at 0.4-1 depending on how much the effect should be. 0.7-0.9 works best most of the time, not overwriting the style of an image.
At multiple LoRAs is seems best to tune the strength a bit down to 0.3-0.6.
Basic workflow example:
Here: https://civarchive.com/models/1811161?modelVersionId=2049602
My favourite UI:
SwarmUI https://github.com/mcmonkeyprojects/SwarmUI
Testing (my specs):
I can go wild on setting with this full checkpoint, even with added LoRAs:
121 frames possible: ~ 3 minutes
121 frames on 24 fps possible (more motion): ~ 3 minutes
128 frames on 24 fps possible (more motion and extended): ~ 3.5 minutes
Dependencies:
YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.
Disclaimer
This models are shared without warranties and with the condition that it is used in a lawful and responsible way. I do not support or take responsibility for illegal, harmful, or harassing uses. By downloading or using it, you accept that you are solely responsible for how it is used.
Description
Just the proper CLIP/T5 Encoder in FP8 for Wan 2.1 Lightspeed
FAQ
Comments (22)
Difference between this and just using Lightx2v?
If you mean the LoRAs, the difference is that you do not need that extra LoRA here, making the usage slightly more efficient and possibly more stable with other LoRAs. Please don't hold me to this, though. As mentioned, this is the Lightx2v version, just fp8. There full single checkpoint is fp16 and requires significantly more RAM and VRAM. And it is for those who just do prefer a checkpoint.
Edit: Maybe sleep on it...
Don't sleep on this one if you've been using FusionX, or Teacache and all that. It's playing those games but on another level.
Getting a lot better motion out of this. Having to rework several default prompts and lora weight levels. Decreasing Lora weights, removing some. This means decreasing their non-intended side affects.
With more motion you can use lower fps and get longer videos. Use something like RIFE VFI for frame interpolation to make the video smoother.
You could also use a lower length value (still using interpolation) to significantly lower video generation.
@darksidewalker is working on uploading safetensors for text encoders. I'm currently testing with umt5_xxl_fp16 and umt5_xxl_fp8_e4m3fn_scaled. Using clip_vision_h. Maybe all that is the best option? I have no idea. I'm a tinkerer not an expert.
RTX4070 12gb
shift - 5
length - 81
steps - 4 (going higher to 6 or 8 doesn't hurt times too much, but might hurt or help the desired results)
cfg - 1
sampler - lcm
scheduler - simple or sgmuniform (78 seconds)
rife vfe - 3x multiplier ( 88 seconds)
fps on output - 40
Video length = 5.5 seconds
Generation time = 182 seconds, or 3 minutes.
I typically don't check metrics, I just shoot for between 200-300 seconds with the best results. Switching other values to make that happen. I do know that I can now have longer videos or shorten execution times fairly significantly using this checkpoint.
I'm going to work with this for another day or two and nail down my workflow and I'll upload a template for you guys. I don't see myself going back to fusionx or the teacache, sage... setups. Smarter people than I will probably poo on it and tell me better things to use, hopefully.
Sadly, I can't seem to get a high enough acceptable rate of anything usable to make it worth while. It doesn't seem to understand physics very well, but it can allow for more abstract things I guess. it could be that it's a fine model, but just not lora friendly. If you don't need loras for what you're trying to create, maybe you would be fine. Not sure. It doesn't handle sexual prompts without one.
rando2048 I'm sure you using the LoRAs wrong. You have to use 480p LoRAs and at a str 0.4-0.9. On my testing it understands almost all LoRAs, even some 720p. Check my submissions they are made with LoRAs. Do you have an example image with a motion you want to? I could test that... maybe your setup is not correctly configured?
rando2048 The prompting for some loras can be tricky. For example: I have one that if the view is from the front. You have to say frontal view. And some loras don't play well together. Another trick, if your video goes to crap after a few seconds, only make 3 second long videos and use the last frame to continue.
I haven't had the need to use it with this model, but.... using {} around wording helps to empathies it. Like {fast motion}. Back on Loras; I keep a text document in my lora folder that has the name of each lora and trigger words. It lets me quick ref them.
delta45424155 Also if you use more than 1 LoRA you should tune some down, so that they not mess up each other. The clearer and chronological you describe the motions the better results you get. Through I also need 1-10 tries to get what I want. Video diffusions are more prone to random actions than image diffusions. Plus, make sure not to apply more "speed up" tricks and LoRAs to this checkpoint, it is baked in.
darksidewalker I really only add sage attention to my workflow and vifi interlope to go from 16fps to 32fps. I did add blockswap to help with my limited 16gigs of vram on the 5080. I've found 720x720 resolution works nicely as well.
delta45424155 32 fps there have to be a huge amount of motion! It could handle 720p but the results are not stable imho. SwarmUI does not blockswap or it is build in, I could always gen everything with my 16GB VRAM.
darksidewalker the way my workflow works. it render in 16fps and then takes that and smooths the motion by doubling the fps. So the motion doesn't change.
delta45424155 Ah, ok, frameinterpolation.
delta45424155 can you share your workflow, i`m curious we can add what node to increase speed
sekaiwlc07860 I just added sage attention. It's a single step. Most recommend picking the triton option. So thats what I use.
delta45424155 Thanks, I do it the same, and I add a film vfi node, but the node use more time,88 second ,I think it`s too long. Maybe I set the wrong parameter
How do I use this with Automatic 1111 webui?
I've never used a video model before >_<
A1111 is not a good option when it comes to latest models or video gen. Better use swarmui or comfyui.
darksidewalker Ok thank you! :)
I had some troubles with this model, but now that I've fixed it, everything works fine. Very impressive. What used to take over 30 minutes with the original model, this one takes less than 5 minutes. It's like magic. Please upload a 720p model too. 👍
Is planned as soon as available.😊
Is this a Lora or checkpoint?
As the details say: checkpoint
LoRa usually is not bigger size as this one