Stop! These models are not for txt2img inference!
Don't put them in your stable-diffusion-webui/models directory and expect to make images!
So what are these?
These are new Modelscope based models for txt2video, optimized to produce 16:9 video compositions. They've been trained on 9,923 video clips and 29,769 tagged frames at 24 fps, 1024x576 res.
Note that these are the bigger brothers to the https://civarchive.com/models/96454/zeroscope-v2-576w-txt2video models. The XL models use 15.3GB of VRAM when rendering 30 fps at 1024x576.
Where do they go?
Drop them in the \stable-diffusion-webui\models\ModelScope\t2v folder
It's imperative you rename the text2video_pytorch_model.pt to .pth extension after downloading.
The files must be named open_clip_pytorch_model.bin, and text2video_pytorch_model.pth
Who made them? Original Source?
https://huggingface.co/cerspense/zeroscope_v2_XL
What else do I need?
These models are specifically for use with the txt2video Auto1111 WebUI Extension
Description
FAQ
Comments (11)
15.3 gb vram :cry:
You know you tagged this one "text2video" but the previous one "txt2video," so they are separate when checking for each other by tag?
Brings up the issue there's no associative tagging on the site, so a lot of related content isn't discoverable via similar tags. For instance, when someone writes a tag "Full Metal Alchemist," but someone else only uses "FMA," they are separate in tag searches.
... I should really post this on the Ideas, I know, but I also don't know how to elaborate this - if it hasn't been before. Summary of all this is: Synonym tags should have a way of being linked so everything is properly grouped together, or something to the same effect.
Hi, thanks a lot sharing!! i was wondering if this needs any other instalation besides the two files there im trying to use it on 1111 extention and i realise that there is also a config file and a vq gan encoder for modelscope are those the same? also, do you recomend using the small version first? also noticing that this needs xformers running.
You can use Modelsope's VQGAN/Config - I'm not even sure they're required, perhaps they are. If you have at least 15GB of VRAM there's no need to use the small model - go straight for the big leagues! And I don't use xformers, Torch 2.0, sdp attention gang over here.
cool im gonna try that thanks!!
it works like a charm!!!!!!!
Tip of the day! every model you need to edit the config. More VRAM the better and stay inbound in frame size as per given model.
Curious if this would work wide screen 16:9? Or portrait only?
tested this out and only getting watermarked shutterstock videos playing, under models only modelscope is shown, had to create the two folder ModelScope/t2v in models dir, on extention shows models/text2video and the extention is in extentions folder and looking at all json and .py files nothing is calling for the ModelScope directory, so idk.