SCAIL-2 GGUF MOTION TRANSFER Reference Image to Video + MultiGPU
Turn a single character image into a fully animated video that copies the motion of any driving clip — at the **full length** of your input video, not just a fixed 5-second window. Built on the **Wan 2.1 SCAIL-2** model in quantized **GGUF** format so it runs on consumer GPUs, with optional **dual-GPU weight offloading** to keep speeds up.
---
✨ What this workflow does
Feed it **one reference image** (your character) and **one driving video** (the motion). SAM3 automatically tracks and masks the subject, the pose from the driving video drives the animation, and a **real chunked sampling loop** generates the entire clip — stitching sliding windows together with color-matching so there are no harsh seams between chunks.
- **Full-length output** — a sliding-window loop (81-frame initial window + 76-frame continuation windows) covers your whole driving video. No 5-second cap.
- **Motion transfer** — the subject in your reference image performs the exact motion of the driving clip.
- **Automatic subject masking** — SAM3.1 tracking isolates the character; no manual rotoscoping.
- **GGUF quantized model** — Q4_K_M weights fit comfortably in consumer VRAM.
- **Optional 2nd-GPU offload** — push ~10 GB of model weights to a second GPU instead of slow CPU offload.
- **Built-in side-by-side comparison output** — see reference vs. result in one render.
- **Organized & documented** — color-coded node groups and an on-canvas README note with every download link.
---
🎬 How to use it
1. **Reference image** → load your character in the `LoadImage` node (INPUTS group).
2. **Driving video** → load your motion clip in `VHS_LoadVideo` (INPUTS group). Leave `custom_width = 480` — it keeps system RAM low and matches the working resolution.
3. **Prompts** → describe the scene in the positive prompt and what to avoid in the negative (PROMPTS group).
4. **Press Run.** The output appears **only after the loop finishes** — there are no mid-run previews (this is normal, not a freeze). The OUTPUT group holds the final stitched video; the COMPARISON group shows the side-by-side.
**Speed tip:** set `select_every_nth = 2` on `VHS_LoadVideo` to roughly halve render time at half the temporal resolution. You can also lower the sampler steps.
---
🖥️ Single GPU vs. Dual GPU (model switcher built in)
The workflow includes two model loaders feeding an **Any Switch (rgthree)** "Model Switcher":
- **GGUF Loader – MULTI GPU (default)** — offloads ~10 GB of weights to your **second GPU** (`cuda:1`), keeping compute on `cuda:0`. Dramatically faster than CPU offload.
- **GGUF Loader – SINGLE GPU** — standard single-GPU GGUF loading.
**Switching is manual** (ComfyUI can't auto-detect GPU count). Use **Ctrl+B** to bypass the one you don't want — keep **exactly one** active:
- **Two GPUs:** leave as shipped → MultiGPU loader active, single-GPU loader bypassed.
- **One GPU:** bypass the MultiGPU loader and un-bypass the single-GPU loader. (If you leave the MultiGPU loader active with only one GPU, it will error trying to reach the missing `cuda:1`.)
> Tip: on the MultiGPU loader you can tune `virtual_vram_gb` (default 10) — lower it if your 2nd GPU OOMs, raise it if it has spare room. `donor_device` can also be set to `cpu` for a single-GPU fallback without bypassing.
---
📦 Required models & paths
Place these under your ComfyUI `models/` folder:
```
ComfyUI/
└── models/
├── unet/ (or diffusion_models/)
│ └── SCAIL-2-Q4_K_M.gguf ← supply your own GGUF
├── text_encoders/
│ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
├── clip_vision/
│ └── clip_vision_h.safetensors
├── vae/
│ └── wan_2.1_vae.safetensors
├── loras/
│ └── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
└── checkpoints/
└── sam3.1_multiplex_fp16.safetensors
```
Download links
1. **Text encoder (UMT5 XXL fp8)** → `models/text_encoders`
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors?download=true
2. **CLIP Vision H** → `models/clip_vision`
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors?download=true
3. **Wan 2.1 VAE** → `models/vae`
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors?download=true
4. **LightX2V I2V rank64 step-distill LoRA** → `models/loras`
https://huggingface.co/lgylgy/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64/resolve/main/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors?download=true
5. **SAM3.1 multiplex checkpoint** → `models/checkpoints`
https://huggingface.co/Comfy-Org/sam3.1/resolve/main/checkpoints/sam3.1_multiplex_fp16.safetensors?download=true
6. **SCAIL-2 GGUF diffusion model** → `models/unet`
https://huggingface.co/realrebelai/SCAIL-2_GGUF/resolve/main/SCAIL-2-Q4_K_M.gguf?download=true
---
🧩 Required custom nodes
- **ComfyUI-GGUF** — GGUF UNet loading
- **ComfyUI-MultiGPU** — `UnetLoaderGGUFDisTorch2MultiGPU` (2nd-GPU weight offload)
- **ComfyUI-KJNodes** — WanChunkFeedForward, ImageResizeKJv2, KikoPurgeVRAM, SimpleCalculatorKJ, INTConstant, GetImageRangeFromBatch, Set/Get nodes
- **ComfyUI_Swwan** — WanSCAILToVideo, SCAIL2ColoredMask, SAM3_VideoTrack, ImageConcatMulti
- **ComfyUI-easy-use** — forLoopStart/End, compare, ComfySwitchNode, BatchImagesNode, ColorTransfer
- **ComfyUI-VideoHelperSuite** — VHS_LoadVideo, VHS_VideoCombine, VHS_VideoInfo
- **ComfyUI-Resolution-Master** — ResolutionMaster
- **rgthree-comfy** — Any Switch (Model Switcher), Display Int
All of these are installable through **ComfyUI-Manager** ("Install Missing Custom Nodes").
---
⚙️ Requirements & performance
- A recent ComfyUI build with **Wan 2.1 / SCAIL-2** support.
- **~16 GB VRAM** recommended for the main GPU.
- For MultiGPU offload: a **second GPU with ≥ 11 GB free** VRAM.
- **Render time scales with clip length** — each window is a full diffusion pass. A ~500-frame clip runs roughly 7 windows. Use `select_every_nth` or fewer steps to trade quality/length for speed.
---
🗂️ Workflow layout
Nodes are organized into color-coded groups for clarity:
**INPUTS** (image & video) · **MODELS** (diffusion / VAE / CLIP / sampler) · **PROMPTS** · **PREPROCESS** (resolution / pose resize / CLIP vision) · **MASK & TRACKING (SAM3)** · **CHUNK 1** (first window) · **LOOP MATH** (window / count) · **LOOP BODY** (chunk-2 generation & accumulation) · **OUTPUT** (final video) · **COMPARISON OUTPUT** (side-by-side)
---
📺 Tutorial
Watch how to use this workflow:
https://www.youtube.com/@AiMotionStudio
---
📝 Notes & tips
- Output only appears when the full loop completes — longer clips take longer before you see anything. That's expected.
- Keep exactly one model loader active (single- vs. dual-GPU).
- If you hit a system-RAM error on very long/high-res inputs, keep `VHS_LoadVideo` `custom_width = 480` (already set) and/or raise `select_every_nth`.
- Credits: built on Wan 2.1 SCAIL-2, LightX2V distill LoRA, SAM3.1, and the open-source ComfyUI custom nodes listed above.
Description
Version 1.0
FAQ
Comments (4)
A question: Does the final output keep the background from the video or the one from the image (as would normally be the case)? I've noticed this tendency to keep the background from the video rather than the one from the image. I understand that some people want the background from the video, but it's also good to know how to select the background from the image.
I will have to test this, I think it would be included in the prompt to achieve the background output.
cool, works with 16 vram
yes it does. I use two multiGPU settings for faster generation!
scail-2
poses
vid2vid
v2v
video to video
image to video
reference image to video
motion transfer
scail2 gguf
scail-2 gguf
scail2
Details
Downloads
302
Platform
CivitAI
Platform Status
Available
Created
6/17/2026
Updated
6/28/2026
Deleted
-