V2
V2 is more consistent, has more stable movements, and should get less artifacts. Seems to work very well for 2d inputs as well. All previews were prompted with one prompt for both t2i and i2v, writing separate prompts and picking a good starting image should give even better results.
Use "turbo" lora for high-quality generations in just 4 steps!
The turbo lora is available on huggingface: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22-Turbo/Wan22_TI2V_5B_Turbo_lora_rank_64_fp16.safetensors
To use it, set steps to 4 and cfg to 1. I'm not sure what the recommended sampler/scheduler is but I've had great results on multiple samplers and schedulers. I personally use euler/euler a + beta scheduler.
Using a slightly lower resolution (but not low enough to reduce quality much) I can generate 80 frames in just 2 minutes on a 3060.
This lora is recommended for i2v, but t2v might work decently as well.
Trained on my new mixed furry/human dataset with detailed captions. Older versions of which were also used for the experimental and semi stable text to video loras.
Prompting
Prompting should use natural language, you need to generate at 720p, so for example 1280x704, 704x1280 or 960x960 will be valid. This might be more important for i2v than for t2v, I've noticed artifacts with i2v.
In a prompt, you can describe "a 3d animation", "a 2d animation" or "a real video", this is most useful for t2v but could help i2v as well.
You can view the prompt on the example videos for info as well.
Description
Probably like 10x better than v1
FAQ
Comments (44)
how long does your vae decode take? mine is taking about 3 mins for the vae decode alone and ksampler is 8 sec, not sure how to fix this issue.... at 1280x708
For 80 frames 4 steps sampling plus decode takes 2 minutes. If you have low vram it'll definitely be slower. I also use a q4 quantized version of wan. Just in case. Idk which resolution I used exactly, I use a slightly lower res than official.
interesting i am using q4 too on my 3080ti also using the basic Image to video workflow and i just checked the vae decode using only 9.8gb out of 12gbVram
@SAVITAR_SPEED_GOD 2.2 vae is indeed slow by itself for some reason and takes same time in regulare model and turbo. Currently theres no way to speed it up.
@SAVITAR_SPEED_GOD If you use Tiled Decode, don't. It is slower and has a lossy seam fix
@2P2 Additionally, if you use normal decode and it won't fit in your vram, it switches to tiled automatically iirc. Tiled decode is only meant to reduce vram usage, but since it can automatically enable, there's not really a reason to turn it on as it'll be slower.
@2P2 I get oom without the tiled vae decoder though
@mylo1337 it doesn't do that for some types of models for me for some reason, wan will just oom if I dont use tiled vae decoder node
Try the ltxv tiled vae decoder node, way faster than the normal tiled vae decode.
Works with humans, good job. Can you make a no pov version (side view and all)?
Side view should work, I've only tried deepthroat and doggystyle In side view though.
Can someone share t2v workflow?
Hey bro, still kinda new to this so im unable to link it directly, but if you open up comfyui, and you download one of these videos, (the cheetah one for example), and you drag the download onto comfyui, itll give you the workflow they used.
I am new to video generation and am using SwarmUI. Do you have a template workflow that we could use as a community for this?
If you're using swarmui you can just load the model and type your prompt.
@mylo1337 So I can plug in the "Load Diffusion Model" into the "LoraLoaderModelOnly" select furrynsfw v2 e83 plug that model into "LoraLoaderModelOnly" select turbo and then turbo into "Model SamplingSD3"?
@devo80109 if you're using swarmui you shouldn't use comfy directly for basic things. You don't need to use the nodes manually. Just select the model, and enable the lora. You can also use a t2i model and use i2v automatically on the generated images.
where is the workflow ?
I think that was SwarmUI :/ :(
Where's the update?
If you're referring to the recent vids I posted. Those were generated using a full wan 2.2 5b fine-tune by basedbase, the model is linked in the description. It has much more and more consistent motion at the cost of slightly worse prompt following and occasional face warping.
@mylo1337 basedbase, the troll, has deleted his account
@2P2 oh bummer, the 5b finetune is pretty good aside from warping faces sometimes. I don't think I have permission to reupload though.
I don't think he's a troll, just that he reviewed my model before having a working workflow in the first place. Unless you're referring to something else.
@mylo1337 The reference to Basedbase being a troll is separate. He was releasing "distills" of LLM's and other models and claiming to be an engineer amongst other things. His distill's were checked. They were bit for bit and weight for weight identical to the originals.
Then when trying to defend himself or deflect using a sock account on Reddit he accidentally posted as his main account. Outing himself. He has since deleted his Civitai, Reddit, Github, and Hugging Face amongst others.
@open_channels_for_all that explains that then.
Also, with ltx 2 releasing soon I'll have to prepare a dataset with audio soon, if I want to train a lora early that is.
Also had an idea for a low memory training solution that should in theory have the memory usage of lora training but the ability to fully fine-tune a model's weights. But I still got to write an implementation to confirm that's actually the case.
@mylo1337 Phenomenal. I look forward to it.
I was going to sit down to try to attempt a 5B Lora myself. But this is new territory for me. And I’m a touch lost… advice on where to start?
@Open_channels_for_ALL You could look into diffusion pipe (https://github.com/tdrussell/diffusion-pipe), it's got good support for many models and imo the best pre-processing for videos. (You can cut a video clip into multiple segments of the max length automatically (With "multiple_overlapping" as the video_clip_mode setting), while other tools like ai-toolkit will simply reduce the total frame count, breaking the pacing so you'd have to manually cut your videos the right length for that to work).
Might be a bit confusing at first, but there are example configs as well.
where can i find the turbo lora, huggin link says error 404
This worked surprisingly well in a customised version of OVI. SO video with audio generation i2v, How did you train this? Because the other loras I looked at did not work so well. This is great!
What is OVI?
Hey @mylo1337 its the first offline open source answer to VEO 3 or Sora 2. Ovi has been out a about 10-12 days. And its already been hacked into Comfyui (Comfyui doesn't have LORA support for OVI to my knowledge yet), and a custom Webui by SECourses was put out.
Its the SECourses webui I used. I paid for his Patreon for it. Which was very worth. Nice guy. But he doesn't do NSFW so don't ask. He's an engineer and researcher. He's gotten his version to work on GPU's down to 6GB of VRAM.
OVI is a fusion of MMAudio and as customised WAN 5B that makes T2V and I2V with the audio baked in at the same time.
@open_channels_for_all from what I gathered just the audio vae is mmaudio. Mmaudio can't do speech but OVI can. I'll consider training a lora for audio + video when that becomes available.
@mylo1337 If you make more video loras even just motion your training method or dataset seems to be relatively compatible with current OVI. What is your process? I've been considering training for 5B. Its an under utilised model.
Should work on both, as it keeps character traits consistent
I'm using ComfyUI and I can't get the workflow from any of these. Would someone be able to share the workflow that's compatible with Comfy? Even just a screenshot of your working workflow is fine. These all look amazing and I'd really like to give it a try.
The gens were made using swarmui, which embeds swarm metadata instead of comfy metadata. Additionally since they're videos swarm doesn't embed the metadata. An easy way to try it is to use swarmui (you can link it with your comfyui) or use the default comfyui workflow for wan 2.2 5B I2v with a loraloadermodelonly node after the diffusion model loader.
I cannot get the model to load, it seems that SwarmUI doesnt recognize the model Architecture. Am I missing a step in setting it up or missing a precursor model?
This is a lora, it should be in your lora folder, with the wan 2.2 5b model in your diffusion_models folder.
@mylo1337 Yeah, I have it as that, but its saying "LoRA Model Furry_+_nsfw_wan_2-2_5b_-_v2-0_e83.safetensors did not match any of 149 options. Class sorter may need refinement, or you may have a model that is not natively supported in SwarmUI."
@abyssal358 just set it as wan 2.2 ti2v 5b lora manually. The classification isn't super important for loras as most types are treated the same in swarmui
I used an image with a duck character, but their beak was botched in the video. What happened?