Zeroscope V2 XL (txt2video) - v1.0

Stop! These models are not for txt2img inference!

Don't put them in your stable-diffusion-webui/models directory and expect to make images!

So what are these?

These are new Modelscope based models for txt2video, optimized to produce 16:9 video compositions. They've been trained on 9,923 video clips and 29,769 tagged frames at 24 fps, 1024x576 res.

Note that these are the bigger brothers to the https://civarchive.com/models/96454/zeroscope-v2-576w-txt2video models. The XL models use 15.3GB of VRAM when rendering 30 fps at 1024x576.

Where do they go?

Drop them in the \stable-diffusion-webui\models\ModelScope\t2v folder

It's imperative you rename the text2video_pytorch_model.pt to .pth extension after downloading.

The files must be named open_clip_pytorch_model.bin, and text2video_pytorch_model.pth

Who made them? Original Source?

https://huggingface.co/cerspense/zeroscope_v2_XL

What else do I need?

These models are specifically for use with the txt2video Auto1111 WebUI Extension

Description

FAQ

Comments (11)

EnjuJun 25, 2023· 2 reactions

CivitAI

15.3 gb vram :cry:

ElunaJun 25, 2023· 2 reactions

CivitAI

You know you tagged this one "text2video" but the previous one "txt2video," so they are separate when checking for each other by tag?

Brings up the issue there's no associative tagging on the site, so a lot of related content isn't discoverable via similar tags. For instance, when someone writes a tag "Full Metal Alchemist," but someone else only uses "FMA," they are separate in tag searches.

... I should really post this on the Ideas, I know, but I also don't know how to elaborate this - if it hasn't been before. Summary of all this is: Synonym tags should have a way of being linked so everything is properly grouped together, or something to the same effect.

theally

Author

Jun 25, 2023

Good catch - fixed, and noted! That's perfect for submission to Ideas.

ElunaJun 26, 2023· 1 reaction

@theally Got'cha and got it up. Found the formal terms "Tag Aliases" and "Tag Implications." I apologize in advance if this is implemented, for the extra work in upkeep.

louimposteurJun 25, 2023· 1 reaction

CivitAI

Hi, thanks a lot sharing!! i was wondering if this needs any other instalation besides the two files there im trying to use it on 1111 extention and i realise that there is also a config file and a vq gan encoder for modelscope are those the same? also, do you recomend using the small version first? also noticing that this needs xformers running.

theally

Author

Jun 25, 2023· 1 reaction

You can use Modelsope's VQGAN/Config - I'm not even sure they're required, perhaps they are. If you have at least 15GB of VRAM there's no need to use the small model - go straight for the big leagues! And I don't use xformers, Torch 2.0, sdp attention gang over here.

louimposteurJun 25, 2023

cool im gonna try that thanks!!

louimposteurJun 26, 2023· 1 reaction

it works like a charm!!!!!!!

restofaceJun 26, 2023· 1 reaction

Tip of the day! every model you need to edit the config. More VRAM the better and stay inbound in frame size as per given model.

halr9000Jun 25, 2023

CivitAI

Curious if this would work wide screen 16:9? Or portrait only?

happygoAug 17, 2023· 1 reaction

CivitAI

tested this out and only getting watermarked shutterstock videos playing, under models only modelscope is shown, had to create the two folder ModelScope/t2v in models dir, on extention shows models/text2video and the extention is in extentions folder and looking at all json and .py files nothing is calling for the ModelScope directory, so idk.

Other

SD 1.5

by theally

Download (Beta) View on CivitAI

tool

txt2video

Details

Downloads

1,025

Platform

CivitAI

Platform Status

Available

Created

6/24/2023

Updated

5/11/2026

Deleted

Files

zeroscopeV2XL_v10.pt

Size:

2.63 GB

SHA256:

18dd886130ca1d7228900ac703e88f96f358f040cd56f5392f1d8d7b174ec750

Mirrors

HuggingFace (5 mirrors)

text2video_pytorch_model.pth

zeroscopeV2XL_v10.pt

CivitAI (1 mirrors)

zeroscopeV2XL_v10.pt

zeroscopeV2XL_v10.bin

Size:

1.84 GB

SHA256:

b25d2b2605ea43e0447eb84b8b08ba027855569f74391ecc9a3abf283f045441

Mirrors

HuggingFace (4 mirrors)

open_clip_pytorch_model.bin

CivitAI (1 mirrors)

zeroscopeV2XL_v10.bin

Stop! These models are not for txt2img inference!

So what are these?

Where do they go?

Who made them? Original Source?

What else do I need?

Description

FAQ

What is Zeroscope V2 XL (txt2video)?

What files are available and where can I download them?

Comments (11)

Details

Files

zeroscopeV2XL_v10.pt

Mirrors

zeroscopeV2XL_v10.bin

Mirrors