SCAIL-2 GGUF
Rebels GGUF quantizations of SCAIL-2, the end-to-end character-animation / video motion-transfer model (Wan 2.1 14B backbone) from zai-org. These run the SCAIL-2 DiT in ComfyUI at a fraction of the VRAM the full fp16/fp8 weights require.
Quantized by RealRebelAI ยท GitHub ยท YouTube
โก Load with the GGUF Unet Loader (city96's ComfyUI-GGUF โ Unet Loader (GGUF)). Place the .gguf in ComfyUI/models/unet/.
Quant tiers
Q2_K 6 GB - Smallest, runs on minimal VRAM, expect quality loss
Q3_K_M 8 GB - Budget tier, better coherence than Q2
Q4_K_M 10 GB - Recommended daily driver
Q5_K_M 12 GB - Sweet spot above Q4
Q6_K 14 GB - Higher fidelity
Q8_0 17 GB - Closest to fp16
The loader memory-maps the model, so a larger file costs disk and streaming time, not resident RAM.
Required files
Download each of these separately and place them in the listed ComfyUI folder.
๐ Model
ComfyUI/models/unet/
https://huggingface.co/realrebelai/SCAIL-2_GGUF/tree/main
๐ Text Encoder
ComfyUI/models/text_encoders/ https://huggingface.co/chatpig/encoder/blob/main/umt5_xxl_fp8_e4m3fn_scaled.safetensors
๐๏ธ LoRA (LightX2V step/cfg distill)
ComfyUI/models/loras/ https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
๐ฏ SAM 3.1 Multiplex
ComfyUI/models/sam/ https://huggingface.co/Comfy-Org/sam3.1/blob/main/checkpoints/sam3.1_multiplex_fp16.safetensors
๐๏ธ CLIP Vision
ComfyUI/models/clip_vision/ https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensors
๐จ VAE
ComfyUI/models/vae/ https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors
โ Optional: SCAIL-2 DPO LoRA (untested)
ComfyUI/models/loras/ https://huggingface.co/Comfy-Org/SCAIL-2/blob/main/loras/wan2.1_SCAIL_2_DPO_lora_bf16.safetensors
Folder structure
ComfyUI/models/
โโโ unet/
โ โโโ SCAIL-2-Q4_K_M.gguf โ from this repo
โโโ text_encoders/
โ โโโ umt5-xxl-enc-fp8_e4m3fn.safetensors
โโโ loras/
โ โโโ Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
โ โโโ wan2.1_SCAIL_2_DPO_lora_bf16.safetensors (optional)
โโโ sam/
โ โโโ sam3.1_multiplex_fp16.safetensors
โโโ clip_vision/
โ โโโ clip_vision_h.safetensors
โโโ vae/
โโโ wan_2.1_vae.safetensors
Notes
WEIGHT NOT MERGEDwarning onpatch_embeddingis harmless. ComfyUI builds a 36-channel patch embedding and concatenates the mask channels at runtime; the model fills them internally. The stored 20-channel weight is expected. Generation proceeds normally.The colored mask is a required input even in single-character Animation Mode โ don't remove it from the workflow.
Set width and height explicitly (both divisible by 16; 832ร480 is a good 480p start).
The
SCAIL2ColoredMasknode may require a recent / nightly ComfyUI build.
Credits
Model: zai-org / SCAIL-2
GGUF quantization: RealRebelAI
GGUF tooling: city96 / ComfyUI-GGUF
Description
UPDATE COMFY
FAQ
Comments (2)
Thanks for the workflow! One note though, you linked the unscaled version of the text encoder, but only the scaled version seems to work.
is there a way to keep the background, rather than replace the new character into the video?