CivArchive
    gemma-3-12b-qat-abliterated-sikaworld-fp4-ltx2 - Gemma3-12B-NVFP4-HF
    NSFW
    Preview 123976785

    🌍 Gemma‑3‑12B‑QAT‑Abliterated — Sikaworld FP4 Editions

    Blackwell‑optimized FP4 text encoders for LTX‑2 and 2.3, based on mlabonne’s improved Abliteration technique.

    Important General Note Using the official LTX-2.3 dev NVFP4 model in combination with any of these FP4 text encoders results in a noticeable quality degradation compared to pairing the same text encoders with FP8 or BF16 versions of the LTX-2.3 model. Different workflow variants and their respective quality/speed trade-offs are demonstrated in the embedded showcase videos.

    I have also modified the official ComfyUI template workflow by adjusting the audio and video parameters for better action dynamics and clearer speech output — the same optimizations I already apply in my standard workflow when using Transformers-only checkpoints from KJ.

    🌐 Overview

    The NVIDIA Blackwell architecture update introduced first‑class support for FP4/NVFP4 inference, enabling extremely fast and memory‑efficient text encoders. At the same time, the LTX‑2 development team officially recommends Gemma‑QAT‑based encoders for video generation due to their stable activation distributions, strong semantic gradients, and robust temporal behavior.

    This repository provides two custom FP4 variants of the uncensored Gemma‑3‑12B‑QAT model created by mlabonne using his improved Abliteration v2 method.

    Both models are fully uncensored, explicitly optimized for LTX‑2 and of course LTX-2.3, and designed to deliver strong motion vectors while maintaining spatial coherence.


    📦 The Two FP4 Editions

    This version uses a surgical mixed‑precision stabilizer to preserve facial symmetry and spatial coherence.

    • Layers 0–1 (Input embeddings) kept in BF16.

    • Layers 44–47 (Final output projections) kept in BF16.

    • All LayerNorms and Biases kept in BF16.

    • All mid-transformer layers quantized to FP4.

    Best for: Maximum stability, minimal facial drift, consistent anatomy, and strong but mathematically controlled motion vectors. Highly recommended for complex I2V/T2V tasks.

    🚀 FP4 Pure Edition (No Protected Layers)

    This version is a relentless, flat FP4/NVFP4 quantization of the Abliterated QAT model.

    • All transformer layers (0-47) quantized to FP4.

    • Only LayerNorms and Biases remain in BF16.

    Best for: Maximum performance, the absolute lowest VRAM footprint, and the fastest inference on Blackwell GPUs. It trades a tiny amount of spatial stability for raw speed and more intense, aggressive motion vectors.


    🧰 Usage in ComfyUI

    1. Download your preferred .safetensors file.

    2. Place the file inside your ComfyUI models folder: ComfyUI/models/text_encoders/

    3. Load the model via the standard DualCLIPLoader or LTX‑2 Text Encoder Loader.

    4. Recommended dtype: fp8_e4m3fn (Note: The BF16‑protected layers will automatically be respected and kept in BF16 by ComfyUI's loader).

    💡 Prompting Tip: Start your prompts with direct action verbs (e.g., "running", "falling", "embracing", "exploding"). FP4 models respond extremely well to dynamic, upfront phrasing.


    🔬 Technical Background

    Why Gemma‑QAT for LTX‑2?

    The LTX‑2 base model architecture reacts very sensitively to the text encoder's conditioning. The LTX‑team recommends QAT (Quantization-Aware Training) encoders because they provide:

    • Stable activation distributions

    • Smooth residual streams

    • Strong temporal gradients

    • Robust spatial alignment

    • Heavily reduced “frozen video” (motion collapse) behavior

    The Abliteration V2 Magic

    These models are derived from mlabonne/gemma-3-12b-it-qat-abliterated. Abliteration is a multi‑step orthogonalization process, not just a simple deletion. It compares residual streams from harmful vs. harmless samples, computes a "refusal direction", and subtracts this direction natively from the hidden states of target modules. The result is a fully uncensored, high‑fidelity instruction model with loud and uninhibited semantic gradients — acting as the perfect cure for static/frozen LTX‑2 generations.

    Why FP4 for Blackwell GPUs?

    NVIDIA's latest Blackwell Tensor Cores are explicitly optimized for FP4/NVFP4 mathematical operations. This format offers:

    • Significantly higher throughput than FP8

    • Extremely low VRAM footprint

    • Faster long‑prompt (prefill) inference

    • Decreased pressure on memory bandwidth

    These FP4 editions feature a pure FP4 tensor layout (with appropriate micro-block and global scales) fully compatible with NVFP4 hardware acceleration on RTX 50‑series and data center hardware.


    📊 Technical Summary

    Component🛡️ High‑Fidelity Edition🚀 Pure EditionBase Modelmlabonne/gemma‑3‑12b‑it‑qat‑abliteratedmlabonne/gemma‑3‑12b‑it‑qat‑abliteratedQuantizationFP4 + BF16 stabilizerPure FP4Protected Layers0–1, 44–47NoneNorms & BiasesBF16BF16Inference SpeedFastFastestStabilityHighestModerateVRAM UsageLowLowest

    --

    🏷️ Credits & Acknowledgments

    • Base Model & Abliteration v2: mlabonne

    • QAT Architecture & Gemma Weights: Google

    • FP4 Optimization, Hybrid Architecture & Stabilization: Sikaworld

    • LTX‑2 & QAT Recommendation: Lightricks / LTX‑Team

    Description

    FAQ

    Checkpoint
    LTXV 2.3

    Details

    Downloads
    181
    Platform
    CivitAI
    Platform Status
    Available
    Created
    3/22/2026
    Updated
    4/27/2026
    Deleted
    -

    Files

    gemma312bQatAbliterated_gemma312BNVFP4HF.safetensors