CivArchive

    Wan-S2V is an AI video generation model that can transform static images and audio into high-quality videos.

    WIP: working on description adding all needed infos/tools! Use with some caution 🤪

    Note: S2V has a very high chance of producing some 1st "flashy" over-saturated frames. That seems a limitation of all Wan 2.2 S2V models right now.

    Requirements:

    • lite lora for 4/8-step operation (optional)

    • Main Model Wan2.2-S2V-14B ComfyUI/models/unet GGUF

    • Audio Encoder wav2vec2_large_english ComfyUI/models/audio_encoders

    • Encoder Umt5-xxl ComfyUI/models/text_encoders

    • Wan2.1_VAE.safetensors ComfyUI/models/vae

    Usage hints:

    • Audio file should be about same length as the video file in seconds

    👂🎶 👉 Hint: Click the sample for full-screen and play from the post with SOUND ON!

    Sources:

    Clip: https://huggingface.co/city96/umt5-xxl-encoder-gguf/

    Model: https://huggingface.co/QuantStack/Wan2.2-S2V-14B-GGUF/

    Lite LoRA: https://huggingface.co/calcuis/wan2-gguf/


    YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.

    Description

    wav2vec2_large_english_fp8_e4m3fn

    FAQ

    Checkpoint
    Wan Video 2.2 I2V-A14B

    Details

    Downloads
    137
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    8/30/2025
    Updated
    9/24/2025
    Deleted
    9/24/2025

    Files

    wan22S2V14BGGUF_wav2vec2LEN.safetensors