CivArchive
    SCAIL-2 GGUF MOTION TRANSFER Reference Image to Video Vid2Vid + MultiGPU - v1.0
    NSFW
    SCAIL-2 GGUF MOTION TRANSFER Reference Image to Video + MultiGPU
    
    Turn a single character image into a fully animated video that copies the motion of any driving clip — at the **full length** of your input video, not just a fixed 5-second window. Built on the **Wan 2.1 SCAIL-2** model in quantized **GGUF** format so it runs on consumer GPUs, with optional **dual-GPU weight offloading** to keep speeds up.
    
    ---
    
     ✨ What this workflow does
    
    Feed it **one reference image** (your character) and **one driving video** (the motion). SAM3 automatically tracks and masks the subject, the pose from the driving video drives the animation, and a **real chunked sampling loop** generates the entire clip — stitching sliding windows together with color-matching so there are no harsh seams between chunks.
    
    - **Full-length output** — a sliding-window loop (81-frame initial window + 76-frame continuation windows) covers your whole driving video. No 5-second cap.
    - **Motion transfer** — the subject in your reference image performs the exact motion of the driving clip.
    - **Automatic subject masking** — SAM3.1 tracking isolates the character; no manual rotoscoping.
    - **GGUF quantized model** — Q4_K_M weights fit comfortably in consumer VRAM.
    - **Optional 2nd-GPU offload** — push ~10 GB of model weights to a second GPU instead of slow CPU offload.
    - **Built-in side-by-side comparison output** — see reference vs. result in one render.
    - **Organized & documented** — color-coded node groups and an on-canvas README note with every download link.
    
    ---
    
     🎬 How to use it
    
    1. **Reference image** → load your character in the `LoadImage` node (INPUTS group).
    2. **Driving video** → load your motion clip in `VHS_LoadVideo` (INPUTS group). Leave `custom_width = 480` — it keeps system RAM low and matches the working resolution.
    3. **Prompts** → describe the scene in the positive prompt and what to avoid in the negative (PROMPTS group).
    4. **Press Run.** The output appears **only after the loop finishes** — there are no mid-run previews (this is normal, not a freeze). The OUTPUT group holds the final stitched video; the COMPARISON group shows the side-by-side.
    
    **Speed tip:** set `select_every_nth = 2` on `VHS_LoadVideo` to roughly halve render time at half the temporal resolution. You can also lower the sampler steps.
    
    ---
    
     🖥️ Single GPU vs. Dual GPU (model switcher built in)
    
    The workflow includes two model loaders feeding an **Any Switch (rgthree)** "Model Switcher":
    
    - **GGUF Loader – MULTI GPU (default)** — offloads ~10 GB of weights to your **second GPU** (`cuda:1`), keeping compute on `cuda:0`. Dramatically faster than CPU offload.
    - **GGUF Loader – SINGLE GPU** — standard single-GPU GGUF loading.
    
    **Switching is manual** (ComfyUI can't auto-detect GPU count). Use **Ctrl+B** to bypass the one you don't want — keep **exactly one** active:
    
    - **Two GPUs:** leave as shipped → MultiGPU loader active, single-GPU loader bypassed.
    - **One GPU:** bypass the MultiGPU loader and un-bypass the single-GPU loader. (If you leave the MultiGPU loader active with only one GPU, it will error trying to reach the missing `cuda:1`.)
    
    > Tip: on the MultiGPU loader you can tune `virtual_vram_gb` (default 10) — lower it if your 2nd GPU OOMs, raise it if it has spare room. `donor_device` can also be set to `cpu` for a single-GPU fallback without bypassing.
    
    ---
    
     📦 Required models & paths
    
    Place these under your ComfyUI `models/` folder:
    
    ```
    ComfyUI/
    └── models/
        ├── unet/  (or diffusion_models/)
        │   └── SCAIL-2-Q4_K_M.gguf            ← supply your own GGUF
        ├── text_encoders/
        │   └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
        ├── clip_vision/
        │   └── clip_vision_h.safetensors
        ├── vae/
        │   └── wan_2.1_vae.safetensors
        ├── loras/
        │   └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
        └── checkpoints/
            └── sam3.1_multiplex_fp16.safetensors
    ```
    
     Download links
    
    1. **Text encoder (UMT5 XXL fp8)** → `models/text_encoders`
       https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors?download=true
    2. **CLIP Vision H** → `models/clip_vision`
       https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors?download=true
    3. **Wan 2.1 VAE** → `models/vae`
       https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors?download=true
    4. **LightX2V I2V rank64 step-distill LoRA** → `models/loras`
       https://huggingface.co/lgylgy/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64/resolve/main/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors?download=true
    5. **SAM3.1 multiplex checkpoint** → `models/checkpoints`
       https://huggingface.co/Comfy-Org/sam3.1/resolve/main/checkpoints/sam3.1_multiplex_fp16.safetensors?download=true
    6. **SCAIL-2 GGUF diffusion model** → `models/unet`
       https://huggingface.co/realrebelai/SCAIL-2_GGUF/resolve/main/SCAIL-2-Q4_K_M.gguf?download=true
    
    ---
    
     🧩 Required custom nodes
    
    - **ComfyUI-GGUF** — GGUF UNet loading
    - **ComfyUI-MultiGPU** — `UnetLoaderGGUFDisTorch2MultiGPU` (2nd-GPU weight offload)
    - **ComfyUI-KJNodes** — WanChunkFeedForward, ImageResizeKJv2, KikoPurgeVRAM, SimpleCalculatorKJ, INTConstant, GetImageRangeFromBatch, Set/Get nodes
    - **ComfyUI_Swwan** — WanSCAILToVideo, SCAIL2ColoredMask, SAM3_VideoTrack, ImageConcatMulti
    - **ComfyUI-easy-use** — forLoopStart/End, compare, ComfySwitchNode, BatchImagesNode, ColorTransfer
    - **ComfyUI-VideoHelperSuite** — VHS_LoadVideo, VHS_VideoCombine, VHS_VideoInfo
    - **ComfyUI-Resolution-Master** — ResolutionMaster
    - **rgthree-comfy** — Any Switch (Model Switcher), Display Int
    
    All of these are installable through **ComfyUI-Manager** ("Install Missing Custom Nodes").
    
    ---
    
     ⚙️ Requirements & performance
    
    - A recent ComfyUI build with **Wan 2.1 / SCAIL-2** support.
    - **~16 GB VRAM** recommended for the main GPU.
    - For MultiGPU offload: a **second GPU with ≥ 11 GB free** VRAM.
    - **Render time scales with clip length** — each window is a full diffusion pass. A ~500-frame clip runs roughly 7 windows. Use `select_every_nth` or fewer steps to trade quality/length for speed.
    
    ---
    
     🗂️ Workflow layout
    
    Nodes are organized into color-coded groups for clarity:
    
    **INPUTS** (image & video) · **MODELS** (diffusion / VAE / CLIP / sampler) · **PROMPTS** · **PREPROCESS** (resolution / pose resize / CLIP vision) · **MASK & TRACKING (SAM3)** · **CHUNK 1** (first window) · **LOOP MATH** (window / count) · **LOOP BODY** (chunk-2 generation & accumulation) · **OUTPUT** (final video) · **COMPARISON OUTPUT** (side-by-side)
    
    ---
    
     📺 Tutorial
    
    Watch how to use this workflow:
    https://www.youtube.com/@AiMotionStudio
    
    ---
    
     📝 Notes & tips
    
    - Output only appears when the full loop completes — longer clips take longer before you see anything. That's expected.
    - Keep exactly one model loader active (single- vs. dual-GPU).
    - If you hit a system-RAM error on very long/high-res inputs, keep `VHS_LoadVideo` `custom_width = 480` (already set) and/or raise `select_every_nth`.
    - Credits: built on Wan 2.1 SCAIL-2, LightX2V distill LoRA, SAM3.1, and the open-source ComfyUI custom nodes listed above.
    

    Description

    Version 1.0

    FAQ

    Comments (4)

    drak0nJun 17, 2026
    CivitAI

    A question: Does the final output keep the background from the video or the one from the image (as would normally be the case)? I've noticed this tendency to keep the background from the video rather than the one from the image. I understand that some people want the background from the video, but it's also good to know how to select the background from the image.

    AIMotionStudio
    Author
    Jun 17, 2026

    I will have to test this, I think it would be included in the prompt to achieve the background output.

    popestmasterJun 19, 2026
    CivitAI

    cool, works with 16 vram

    AIMotionStudio
    Author
    Jun 19, 2026

    yes it does. I use two multiGPU settings for faster generation!

    Workflows
    Wan Video 2.2 I2V-A14B

    Details

    Downloads
    302
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/17/2026
    Updated
    6/28/2026
    Deleted
    -

    Files

    scail2GGUFMOTIONTRANSFER_v10.zip

    Mirrors