SeedVR2: one-step 4X video/image upscaling (and beyond) with BlockSwap and great temporal consistency

Restore and upscale any video to 4X and beyond in a single step with ByteDance's revolutionary SeedVR2.

Watch the complete 32-minute deep dive above explaining every parameter and optimization.

🚀 What this workflow does

This workflow implements SeedVR2's groundbreaking one-step video restoration that previously required 15-50 denoising steps. Unlike traditional upscalers that process frames individually (causing flickering), SeedVR2 maintains temporal consistency by processing batches of frames together.

Key features:

One-step processing - 15-50x faster than traditional diffusion upscalers
Unlimited resolution - Tested up to 10x upscaling (limited only by VRAM)
Temporal consistency - No flickering with high batch_size
Alpha channel support - Upscale image sequences by chaining two upscale nodes
BlockSwap enabled - Run 7B parameter models with 16GB VRAM

📚 What You'll learn in the tutorial

Architecture deep dive:

- How Diffusion Adversarial Post-Training achieves single-step inference

- Why GANs + Diffusion = game changer for video restoration

- Understanding the Swin Transformer backbone

Practical implementation:

- Choosing between 3B/7B models and FP8/FP16 precision

- Why batch_size must be high for optimal results

- BlockSwap configuration for limited VRAM (detailed parameter breakdown)

- Memory optimization strategies

Advanced Workflows:

- Processing image sequences with alpha channels

- Multi-GPU command line setup for production pipelines

- Resolution stepping to control detail enhancement

- Dealing with oversharpening on AI-generated content

🛠️ Workflow Includes

- Image & Video upscaling workflow, including image sequences with alpha channel

⚡ Performance notes

- 3B FP8: Fastest, good for previews

- 7B FP16: Best quality, requires BlockSwap on consumer cards

- VAE bottleneck: 95% of processing time is encoding/decoding and the VAE is currently using a fair amount of VRAM.

- Temporal batching: Higher batch_size = better consistency but more VRAM

🎯 Best use cases

✅ Perfect for:

Restoring compressed/heavily degraded footage
Upscaling legacy content
AI-generated video enhancement

⚠️ Consider alternatives for:

Already high-quality footage (may oversharpen)
Limited VRAM
Content requiring subtle enhancement

🔧 Requirements

ComfyUI (latest version)
16GB+ VRAM recommended
ComfyUI-SeedVR2_VideoUpscaler by NumZ: https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler
ComfyUI-CoCoTools_IO by Conor-Collins: https://github.com/Conor-Collins/ComfyUI-CoCoTools_IO
ComfyUI-VideoHelperSuite by Kosinkadink: https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite
Models auto-download on first use

💙 Support our work

If you found this tutorial helpful and want to support more open-source content like this, any contribution helps us continue creating in-depth guides for the community: https://donate.stripe.com/bJe8wH1KVcAY8yEa0ids40o

Every donation enables us to dedicate more time to research, testing, and sharing knowledge. Thank you for being part of this journey!