CivArchive
    Wan 2.2 5B - Latent Video Upscaler & Enhancer / Transform Low-Res Videos into HD Masterpieces — The Intelligent Way - v1.0

    Transform Low-Res Videos into HD Masterpieces — The Intelligent Way


    Introduction: Beyond Traditional Upscaling

    Traditional AI upscalers like RealESRGAN are great for images, but they often struggle with videos. They can introduce artifacts, fail to add meaningful detail, and leave footage looking blurry and unconvincing.

    This workflow, "Wan 2.2 5B - Latent Video Upscaler," offers a paradigm shift. Instead of just guessing pixels, it uses the immense power of the Wan 2.2 5B Text-to-Video model to intelligently reinterpret and reconstruct your video in high definition. It doesn't just scale up; it dreams up the missing details, resulting in a cleaner, more detailed, and more coherent HD video than any conventional upscaler can achieve.

    TL;DR: Stop using image upscalers on video. Use a diffusion model to truly enhance and upscale your footage with intelligent detail.


    Key Features & Highlights

    • 🤖 Intelligent Enhancement: Leverages the Wan 2.2 5B model to add semantically correct details, textures, and coherence, far surpassing the capabilities of traditional upscalers.

    • ⚡ Fast & Efficient: Built on the lightweight 5B parameter model, this workflow performs latent upscaling and denoising significantly faster than generating from scratch.

    • 🎨 Quality Preservation: Applies a light touch (denoise=0.2) to enhance and upscale without altering the original motion or content of the video drastically.

    • 📈 2x Resolution Boost: Doubles the resolution of your input video directly in the latent space before decoding.

    • 🎬 Smooth Final Output: Includes an optional RIFE frame interpolation pass to double the frame rate (from 16fps to 32fps) for buttery-smooth motion in the final render.

    • 🔊 Audio Passthrough: Automatically carries over the original audio track from your source video to the final enhanced output.


    Workflow Overview & Strategy

    This workflow is a sophisticated video processing chain:

    1. Input: Load your low-resolution source video using VHS_LoadVideo.

    2. Initial Upscale: The video is immediately 2x upscaled using a Lanczos filter to get to the target size. This provides a better starting point for the model.

    3. Latent Processing: The upscaled frames are encoded into the latent space.

    4. Intelligent Enhancement: The core of the workflow. The Wan 2.2 5B model, guided by quality-positive and detail-negative prompts, gently denoises (denoise=0.2) the latents over just 8 steps with UniPC. This step is where the "magic" happens—the model fills in plausible, high-quality details.

    5. Decoding: The enhanced latents are decoded back into a high-resolution image sequence.

    6. Final Output:

      • Option A: Save the immediately upscaled video at 16fps.

      • Option B (Recommended): Pass the sequence through RIFE VFI to interpolate frames to 32fps, creating a final video that is both high-resolution and super smooth.


    Technical Details & Requirements

    🧰 Models Required:

    • Base Model: (GGUF Format)

      • Wan2.2-TI2V-5B-Q8_0.gguf

      • Source: Likely from HuggingFace or other model repositories.

    • LoRA:

      • Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors (Applied at strength 0.5)

    • VAE:

      • Wan2.2_VAE.safetensors

    • CLIP Vision: (For GGUF Loader)

      • umt5-xxl-encoder-q4_k_m.gguf

    • Interpolation Model:

      • rife47.pth (For RIFE VFI node)

    ⚙️ Recommended Hardware:

    • A GPU with a good amount of VRAM (e.g., 12GB+) is recommended for comfortable operation, especially when processing longer videos.

    🔌 Custom Nodes:
    This workflow uses:

    • comfyui-videohelpersuite (VHS) - For video loading/combining

    • comfyui-frame-interpolation - For RIFE VFI

    • comfyui-gguf / gguf - For model loading

    • comfyui-easy-use - For memory management

    • comfyui-kjnodes - For performance patches (Sage Attention)


    Usage Instructions

    1. Load the JSON: Import the provided .json file into your ComfyUI.

    2. Load the Models: Ensure all required models are in their correct folders. Check the paths in the LoaderGGUF, VAELoader, and LoraLoaderModelOnly nodes.

    3. Select Your Video: In the VHS_LoadVideo node, click the video icon to select your low-resolution input video.

    4. Queue Prompt: Run the workflow!

    5. Retrieve Output: Find your two enhanced videos in the output directory:

      • .../Wan 2.2 5B Upscales/Denoise 0.2_xxxxx.mp4 (16fps)

      • .../Wan 2.2 5B Upscales/Denoise 0.2_32fps_xxxxx.mp4 (32fps - Smoother)


    Tips & Tricks

    • Denoise Strength: The denoise parameter in the KSampler (default 0.2) is key.

      • ~0.1-0.3: Best for upscaling/enhancement. Preserves the original content while improving quality.

      • >0.5: Will start to significantly alter the content and style, moving towards a new generation based on your video.

    • Source Quality: This workflow excels at breathing new life into low-quality, pixelated, or noisy source videos from older generators.

    • Prompt Engineering: The positive prompt (high detail, high quality...) is generic to encourage enhancement. For stylistic changes, you can modify this prompt (e.g., "cinematic, film grain, photorealism").


    Conclusion: The Future of Video Upscaling

    This workflow demonstrates a powerful new application for diffusion models: not just as generators, but as intelligent enhancement tools. By leveraging the knowledge within the Wan 2.2 model, we can upscale videos with a level of coherence and detail that traditional methods simply cannot match. It’s faster than full generation and smarter than simple scaling.

    Upload your low-res clips and witness the intelligent upscaling revolution.


    Credit: Crafted by the ComfyUI community. Special thanks to the creators of the Wan 2.2 models and the FastWanFullAttn LoRA.

    Description

    FAQ

    Comments (20)

    blobby99Aug 28, 2025· 2 reactions
    CivitAI

    This method is very important, and many of us have been using it for ages as the only decent local temporal solution. I prefer the better low noise 14B Wan2.2 model- the 5B model has no speed advantage if you know how to launch Comfy properly so the model uses RAM not VRAM.

    AND, it is not true that one cannot use image upscaling successfully for some videos - SD ultimate upscale workflows allow 1080P videos to become 4K without hitting VRAM restrictions by processing the video as a series of images, that can later be recombined from a folder. The lack of a temporal element strangely doesn't matter.

    parallelepipedonSep 5, 2025

    Speed? This WF sucks up every available byte of memory on my box. Sadly, I've only got 64 GB RAM and a 4090. But I was all the way down to Q3_K umt5 and wan2.2 i2v low Q2 GGUF and it still maxed both kinds of RAM then OOMed. Yet for regular i2v I'm running the fp16 t5 and Wan2.2 i2v fp16, plus an accelerator LoRA and drone LoRA... and I tacked an upscale on the end, but just ESRGAN and FILM VFI. No OOMs.

    parallelepipedonSep 5, 2025

    Now I'm no stranger to SD ultimate upscale (just new to Comfy). Hell I'll take Forge-style Hires Fix. Just 4x-UltraSharp is enough to give me enough extra fine details. Latent ends up changing the image too much. I guess separating frames and upscaling each individually is a last resort I might have to take. The original images were made in Flux and Wan 2.2 is just taking a wild guess, but doing a surprisingly good job overall. It's just missing the fine details. Overlookable at 1408 x 800... but not at twice that. Maybe you're right... it's time to bite the bullet. You've frame numbers for a "temporal element". ;-> That's all I've ever had doing 3D animation for many years. I'll only render straight to an mp4 for a quickie test.

    I'm new to ComfyUI and so far I haven't found any video scaling models that don't destroy my 16GB of VRAM. Do you have a workflow that works for scaling?

    7765564Aug 28, 2025· 3 reactions
    CivitAI

    It would be nice if you added links to the necessary models in the description.

    RobopsychoAug 29, 2025· 1 reaction
    CivitAI

    it keeps getting stuck at vae decode stage - res is 1216 w 480h (trying for an ultrawide look) and I'm feeding it 16 fps clips with 102 frames per clip. I'm also using all your model files, text encoder and vae files, so idk -- I also have a 4090, lots of ram and cpu so - kinda stuck as to how to make it work. BTW your motion enhancements worked great for me - very cool. But now I'm trying to upscale them and yeah , no dice.

    zardozai
    Author
    Aug 29, 2025

    Did you try using the VAE tile node (encode/decode)?

    For low VRAM, it should work since you are using an RTX 4090, which only has 24 GB of VRAM, if I am correct.

    RobopsychoAug 29, 2025

    @zardozai ok, yeah gpt actually suggested that I do tiling also -- so i should run tiling then -- ok I'll try it -- yes a 4090.

    RobopsychoAug 29, 2025

    @zardozai ok, I'm gonna use a tiled vae decode at 512 just to see if it works. I'll let you know how it goes.

    RobopsychoAug 29, 2025

    @zardozai I got it to work -- batch vae decode worked well at 512, and I was feeding in 24 fps clips before, that's probably why it was failing before. I thought the clips were 16 fps, but the were 24 fps. now that I have the clip feeds coming in at 16 fps it's working now.

    zardozai
    Author
    Aug 29, 2025· 1 reaction

    @Robopsycho To feed in at 24 frames per second, simply adjust the output settings accordingly.

    RobopsychoSep 2, 2025

    any way to make the upscale less saturated? I notice that it turned a warm light pretty orange. I tried putting denoise to 1.0 - we'll see what happens. TY

    zardozai
    Author
    Sep 2, 2025

    @Robopsycho ComfyUI Node: Color Match

    Give it a try.

    RobopsychoSep 3, 2025· 1 reaction

    @zardozai so it worked - made the light bright -- but not too saturated - Thanks :)

    killkrazedSep 3, 2025· 1 reaction
    CivitAI

    Not sure what is going wrong here, but trying to use the workflow unedited I am for some reason getting "torch.linalg.solve: The solver failed because the input matrix is singular."

    I suspect that uni_pc as the sampler is the problem, what other samplers would you say would work best?

    I tried changing to SA_Solver and Beta but then I got "contracted dimensions need to match, but first has size 4 in dim 1 and second has size 0 in dim 0"

    This is when using all the default values of the workflow.

    Tried swapping ComfyUI to launching qith Quad attention instead of Sage Attention, and I don't get any errors, but the workflow finishes far faster than expected and the result is just pure noise.

    zardozai
    Author
    Sep 8, 2025

    I have removed the Lora that can cause this issue in the last version.

    parallelepipedonSep 5, 2025
    CivitAI

    On a 4090 with 64G system RAM, I couldn't get this to work with even the smallest quants. It maxed out both kinds of RAM completely. I was admittedly not trying to upscale tiny videos. Low Res in my world is 16:9-ish at anywhere from 1280 to 1536 wide. I really WANT this to work though. And preferably with the low noise side of 15B. Guess I need to look into BlockSwap, etc.

    zardozai
    Author
    Sep 5, 2025

    do not touch anything and re download the workflow in its original state and run it again

    blobby99Sep 8, 2025· 5 reactions

    There is ZERO chance of temporal upscaling at decent speed on a 4090 if your input and output have so large a resolution, you exhaust latent storage in your VRAM. ZERO! Temporal processing requires access to the entire latent data set. What you are doing is catastrophic thrashing of memory from VRAM to system RAM, and more likely to an SSD swap file- absolutely catastrophic.

    Too may people here know nothing about computer science, data flow, memory systems, and memory management. Worse, they think because they overspent on a 4090, or 5090, everything must be possible.

    What you can do is use a non-temporal upscale method. Recently, surprisingly, it was discovered that SD ultimate upscale, with a fixed seed, is good for doubling the linear rez. Any amount of VRAM is good for a pretty high-rez input video, providing you use a workflow that treats each frame as an image, and saves the upscaled images one by one to a folder. Then you can create a new upscaled video from that folder.

    Workflows
    Wan Video 2.2 TI2V-5B

    Details

    Downloads
    559
    Platform
    CivitAI
    Platform Status
    Available
    Created
    8/28/2025
    Updated
    5/13/2026
    Deleted
    -

    Files

    wan225BLatentVideoUpscalerEnhancer_v10.zip