Qwen Image VAE

Full FP32 Training of Decoder
Works in ComfyUI

Feel free to to suggest onsite support, to civitai staff. I don't think they have any agreements like with FLUX

Overview

This model is a fine-tuned variant of the base Qwen Image VAE, modified to emphasize high-frequency detail preservation and expanded color representation, following an HDR-style reconstruction objective.

The evaluation compares the base and HDR-tuned models using perceptual, structural, distributional, and photometric metrics over identical input data.

Evaluation Summary

Perceptual Fidelity (LPIPS)

Base: 0.0177
HDR: 0.0786

The HDR model exhibits a significant increase in perceptual distance, indicating reduced strict identity reconstruction under deep feature similarity metrics and a shift toward detail-enhancing reconstruction behavior.

Structural Energy (Gradient Magnitude)

Ground Truth: 404.02 (both models)
Base Reconstruction: 313.46
HDR Reconstruction: 687.97

The base model demonstrates strong low-pass behavior with reduced high-frequency content. In contrast, the HDR model exhibits high-frequency amplification, exceeding the structural energy of the original inputs.

Color Distribution Support

Ground Truth: 33150.61 (both models)
Base Reconstruction: 35004.49
HDR Reconstruction: 40133.37

The HDR model produces a substantially expanded color support space, indicating increased chromatic dispersion and reduced quantization collapse.

Photometric Stability

Brightness Bias

Base: 0.000351
HDR: 0.0000098

Contrast Gain

Base: 0.9984
HDR: 0.99999

Both models preserve global photometric consistency, with the HDR variant showing near-perfect affine stability.

Channel Drift

Red Shift:
- Base: +0.0116
- HDR: +0.0104
Green Shift:
- Base: -0.0606
- HDR: -0.1856
Blue Shift:
- Base: +0.0187
- HDR: +0.0219

The HDR model introduces a significantly stronger negative bias in the green channel, while maintaining comparable red and blue stability.

Interpretation

The base Qwen VAE behaves as a contractive perceptual projection operator, prioritizing smooth reconstructions and suppression of high-frequency components.

The HDR-tuned variant transitions into a detail-amplifying reconstruction operator, characterized by:

Increased high-frequency energy
Expanded color manifold coverage
Higher perceptual divergence under LPIPS
Preserved global photometric invariance

This represents a functional shift from a smoothing autoencoder regime toward a high-frequency preserving (HDR-like) reconstruction regime.

Qwen Image VAE

Overview

Evaluation Summary

Perceptual Fidelity (LPIPS)

Structural Energy (Gradient Magnitude)

Color Distribution Support

Photometric Stability

Channel Drift

Interpretation

Description

FAQ

Details

Files

hdrVAEAnimaQWENImage_bf16.safetensors

Mirrors

Qwen Image VAE

Overview

Evaluation Summary

Perceptual Fidelity (LPIPS)

Structural Energy (Gradient Magnitude)

Color Distribution Support

Photometric Stability

Channel Drift

Interpretation

Description

FAQ

What is HDR VAE (Anima - QWEN Image)?

How do I use HDR VAE (Anima - QWEN Image)?

What files are available and where can I download them?

Details

Files

hdrVAEAnimaQWENImage_bf16.safetensors

Mirrors