Stop! These models are not for txt2img inference!
Don't put them in your stable-diffusion-webui/models directory and expect to make images!
So what are these?
These are new Modelscope based models for txt2video, optimized to produce 16:9 video compositions. They've been trained on 9,923 video clips and 29,769 tagged frames at 24 fps, 576x320 res.
Note that they can look much better - I had to convert the mp4 outputs to gif for Civitai. We can also upscale these videos using the Zeroscope v2 XL txt2vid models, which I'm currently uploading!
Note: this model is the lighter version of the XL model (available here) which requires a lot more VRAM. If you have >15GB of VRAM, you should be using the XL version.
Where do they go?
Drop them in the \stable-diffusion-webui\models\ModelScope\t2v folder
It's imperative you rename the text2video_pytorch_model.pt to .pth extension after downloading.
The files must be named open_clip_pytorch_model.bin, and text2video_pytorch_model.pth
Who made them? Original Source?
https://huggingface.co/cerspense/zeroscope_v2_576w
What else do I need?
These models are specifically for use with the txt2video Auto1111 WebUI Extension
Description
FAQ
Comments (21)
Jeebus. If I squint, I can see the end of human civilization from here.
theres only one file here though
Expand the Files section on the left - there are two files.
now we need some smart cookie to marry this concept with lora/lycoris :-)
With gifski you can convert your videos to gif almost without loss of quality (opensource)
Ah, I'll give it a go, thanks!
Thanks a lot for your help, when all the files are install, it works like a charm. Never thought videos were this easy. Cheers!
so its a pickle, and I have to disable safe check with this --disable-safe-unpickle in command line args before SD will read it, hmm, not sure I want to do that, can u post a safe tensor version?
You're not trying to load it like a model for generating images are you? It doesn't work like that, and it doesn't need any command line args.
@theally No , I placed models as directed in \stable-diffusion-webui\models\ModelScope\t2v folder
and using sd-webui-text2video in a1111.
I just get this come up in cmd ,
The file may be malicious, so the program is not going to read it.
You can skip this check with --disable-safe-unpickle commandline argument.
No problem Ill figure it out as always, I'm probably missing something in the setup
I was waiting porn videos, but meh. Still there is none. :D
i dont think loras or embeds are working at all... so potato. But a moving potato none the less!!
No, LoRA and TI won't work with txt2video models.
fully functional txt2video technology and you just shrug and say "poato" cuz it can't do LoRA or TI yet? Man, I must be getting old cuz back in my day we were super impressed by little moving black and white squares on our computer screens (Pong) LOL
Is it possible to use an RTX2060 for this, or should I upgrade to an H100?
It's all about the VRAM - 2060 is 6GB? You might be able to gen a small/short video on 6GB! You're going to want to upgrade before mid-July anyway, as that's when StabilityAI's new SD XL model releases, which requires 8GB minimum! The more the better!
Some prompts for the videos you attached would be appreciated. Ty
Click the little "i" at the bottom right of the image - prompts are included
@theally thanks, but what about cfg scale and steps?
@noneoofurbusiness all defaults, 30 steps, 17 cfg. Good luck! Lots of trial and error involved to get a good gen in this, I've found!
how is it used?
It works for me only with 4 files:
configuration.json
open_clip_pytorch_model.bin
text2video_pytorch_model.pth
VQGAN_autoencoder.pth
Should I just put it together with these?. should i do something else?