CivArchive
    Stable Cascade - base
    Preview 7012098
    Preview 6670637
    Preview 6670638
    Preview 6670641
    Preview 6670659
    Preview 6670670
    Preview 7012974

    Demos:

    Stable Cascade

    This model is built upon the Würstchen architecture and its main

    difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this

    important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes.

    How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being

    encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a

    1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the

    highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable

    Diffusion 1.5. <br> <br>

    Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions

    like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.

    Model Details

    Model Description

    Stable Cascade is a diffusion model trained to generate images given a text prompt.

    • Developed by: Stability AI

    • Funded by: Stability AI

    • Model type: Generative text-to-image model

    Model Sources

    For research purposes, we recommend our StableCascade Github repository (https://github.com/Stability-AI/StableCascade).

    Model Overview

    Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images,

    hence the name "Stable Cascade".

    Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion.

    However, with this setup, a much higher compression of images can be achieved. While the Stable Diffusion models use a

    spatial compression factor of 8, encoding an image with resolution of 1024 x 1024 to 128 x 128, Stable Cascade achieves

    a compression factor of 42. This encodes a 1024 x 1024 image to 24 x 24, while being able to accurately decode the

    image. This comes with the great benefit of cheaper training and inference. Furthermore, Stage C is responsible

    for generating the small 24 x 24 latents given a text prompt. The following picture shows this visually.

    For this release, we are providing two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with

    a 1 billion and 3.6 billion parameter version, but we highly recommend using the 3.6 billion version, as most work was

    put into its finetuning. The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve

    great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the

    best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to

    its small size.

    Evaluation

    According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all

    comparisons. The above picture shows the results from a human evaluation using a mix of parti-prompts (link) and aesthetic prompts. Specifically, Stable Cascade (30 inference steps) was compared against Playground v2 (50 inference

    steps), SDXL (50 inference steps), SDXL Turbo (1 inference step) and Würstchen v2 (30 inference steps).

    Code Example

    ⚠️ Important: For the code below to work, you have to install diffusers from this branch while the PR is WIP.

    pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3

    import torch
    
    from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
    
    device = "cuda"
    
    num_images_per_prompt = 2
    
    prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)
    
    decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)
    
    prompt = "Anthropomorphic cat dressed as a pilot"
    
    negative_prompt = ""
    
    prior_output = prior(
    
    prompt=prompt,
    
    height=1024,
    
    width=1024,
    
    negative_prompt=negative_prompt,
    
    guidance_scale=4.0,
    
    num_images_per_prompt=num_images_per_prompt,
    
    num_inference_steps=20
    
    )
    
    decoder_output = decoder(
    
    image_embeddings=prior_output.image_embeddings.half(),
    
    prompt=prompt,
    
    negative_prompt=negative_prompt,
    
    guidance_scale=0.0,
    
    output_type="pil",
    
    num_inference_steps=10
    
    ).images
    
    #Now decoder_output is a list with your PIL images

    Uses

    Direct Use

    The model is intended for research purposes for now. Possible research areas and tasks include

    • Research on generative models.

    • Safe deployment of models which have the potential to generate harmful content.

    • Probing and understanding the limitations and biases of generative models.

    • Generation of artworks and use in design and other artistic processes.

    • Applications in educational or creative tools.

    Excluded uses are described below.

    Out-of-Scope Use

    The model was not trained to be factual or true representations of people or events,

    and therefore using the model to generate such content is out-of-scope for the abilities of this model.

    The model should not be used in any way that violates Stability AI's Acceptable Use Policy.

    Limitations and Bias

    Limitations

    • Faces and people in general may not be generated properly.

    • The autoencoding part of the model is lossy.

    Recommendations

    The model is intended for research purposes only.

    How to Get Started with the Model

    Check out https://github.com/Stability-AI/StableCascade

    Description

    FAQ

    Comments (8)

    CuauhtemocI5MALFeb 18, 2024
    CivitAI

    I'm tempted to think, It will take some months or weeks to be able to able to use this, and other non Stable Diffusion based models in the image generator of this site XD.

    axicecFeb 18, 2024

    turn around if adopted enough for new things in AI is exponential. but sadly thats down to people complaining less, people get weird when something comes out that looks better than their well matured finetuned 1.5 gens

    zareltgrFeb 21, 2024

    @axicec There are things I would only think of doing in 1.5 models, other things I would go right to XL, and yet other things that look great in either but in different ways. I'm sure this will be just the same... we just gotta find out what roads it wants to travel... as much as we instruct it, so too does it instruct us. The adventure begins!

    steamrickFeb 18, 2024
    CivitAI

    Is it just me or does Stable Cascade completely fail at relational prompts? Try this:

    "a cat laying on a table and a dog lying under the table"

    SDXL will get it right about 1/4 of the time. DallE-3 has near 100% success rate.

    With Stable Cascade the most probable result (in my testing) is two cats on a wooden surface. I've not once had a dog under a table.

    Daedalus_7Feb 19, 2024

    It's still in "Early Access" so it's not even fully trained.. We'll have to wait until it is fully released to see how good it is. So far, it seems to be a cross-over between Turbo and SDXXL (It can do 1536x1536 quite well).

    2157333747323Feb 21, 2024

    I agree with you.When i input "cat",it can be recognized,but i try to input"dog",its position will be wrong. And it also takes a long time.Nonetheless,it is still a nice update,because it use less Vram.

    aiaicaptainMay 2, 2024

    model is not very smart contrary to what we have been reading, but the renders are way more artistic than any other model. use img2img. generate using your favorite model, render with cascade for stunning results.

    katashiFeb 18, 2024
    CivitAI

    reddest model ever

    Checkpoint
    Stable Cascade

    Details

    Downloads
    5,097
    Platform
    CivitAI
    Platform Status
    Available
    Created
    2/14/2024
    Updated
    5/12/2026
    Deleted
    -

    Files

    stableCascade_base.zip

    Mirrors

    CivitAI (1 mirrors)