CivArchive

    Wan-S2V is an AI video generation model that can transform static images and audio into high-quality videos.

    WIP: working on description adding all needed infos/tools! Use with some caution ๐Ÿคช

    Note: S2V has a very high chance of producing some 1st "flashy" over-saturated frames. That seems a limitation of all Wan 2.2 S2V models right now.

    Requirements:

    • lite lora for 4/8-step operation (optional)

    • Main Model Wan2.2-S2V-14B ComfyUI/models/unet GGUF

    • Audio Encoder wav2vec2_large_english ComfyUI/models/audio_encoders

    • Encoder Umt5-xxl ComfyUI/models/text_encoders

    • Wan2.1_VAE.safetensors ComfyUI/models/vae

    Usage hints:

    • Audio file should be about same length as the video file in seconds

    ๐Ÿ‘‚๐ŸŽถ ๐Ÿ‘‰ Hint: Click the sample for full-screen and play from the post with SOUND ON!

    Sources:

    Clip: https://huggingface.co/city96/umt5-xxl-encoder-gguf/

    Model: https://huggingface.co/QuantStack/Wan2.2-S2V-14B-GGUF/

    Lite LoRA: https://huggingface.co/calcuis/wan2-gguf/


    YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.

    Description

    umt5-xxl-encoder-Q8_0

    FAQ

    Comments (3)

    Seeker360Aug 30, 2025ยท 2 reactions
    CivitAI

    I was a bit confused by this at first as I assumed from the description that it was a checkpoint with the TE, VAE, AE etc all bundled into one, but I assume it isn't as I don't know of a GGUF checkpoint loader node?

    It seems to perform similarly to using the GGUFs from Quantstack, but with the added bonus of not needing to load the Lightning Lora separately. The addition of the FP8 Audio Encoder is greatly appreciated as I think the FP16 AE was causing very long generation times and pushing the VRAM to its limits...

    Unfortunately, the combination of low quant GGUF and the Lighting lora replicates the same issue as using the separate files - the lip syncing is blurry and inconsistent and there's next to no motion in the video. I managed to eke out a standard no-GGUF, no-Lightning render yesterday which almost toppled my GPU and took an age to generate. The lip syncing was decent and there was some natural motion that is missing here.

    Not at all a criticism or problem with your model here itself, but a sobering reminder that there just doesn't seem to be any way to get extremely demanding models like S2V to work properly on lower VRAM systems, without compromising about 85% of the quality in the process :(

    haidensd58757Aug 30, 2025
    CivitAI

    What's the difference between this and Image2video?

    cocoleviAug 31, 2025
    CivitAI

    Its possible use this s2v into a 3060 12GB?

    Checkpoint
    Wan Video 2.2 I2V-A14B

    Details

    Downloads
    198
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    8/30/2025
    Updated
    6/2/2026
    Deleted
    4/27/2026

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.