V1.0a Experimental!!
Important, pls read!!
Gemma-3-12B-Heretic-X (Sikaworld High-Fidelity Edition)
This is the ultra-dynamic, fully uncensored text encoder for LTX-2, based on the experimental Heretic-X fine-tune by LastRef.
While the standard abliterated version removes the "refusal" mechanism, Heretic-X was actively steered with a custom dataset to be proactively descriptive and uninhibited. In LTX-2 video generation, this translates to significantly stronger motion vectors, helping to "unfreeze" static videos and generate more intense dynamics in complex scenes.
This edition applies the Sikaworld High-Fidelity Quantization method to tame the aggressive nature of Heretic-X, ensuring that the increased dynamics do not come at the cost of facial symmetry or anatomical coherence.
๐ Key Features
Aggressive Uncensoring (Heretic-X): Unlike standard abliteration (which just deletes the refusal direction), this model uses modified weights (attn.o_proj, mlp.down_proj) derived from x-rated dataset training. It delivers a "louder" and more confident signal to the video transformer, which is often the cure for "frozen" I2V generations.
High-Fidelity Layer Protection (The Stabilizer): Aggressive fine-tunes can often lead to "melting" faces in video. This version uses a Mixed Precision Strategy: The critical input layers (0-1) and the final output layers (44-47), as well as all LayerNorms and Biases, are kept in BF16. This acts as a safety rail, keeping facial features symmetric while allowing the body and background to move dynamically.
True Standalone (.safetensors): Includes the embedded spiece_model tensor. It works as a single-file plug-and-play solution in ComfyUI (LTX-2) without requiring external tokenizer.model files or complex folder structures.
Surgical Extraction: Stripped of the 20GB+ Vision-Tower weights (which LTX-2 does not use) to save VRAM and loading time, while retaining the full 48-layer text intelligence of the 24GB BF16 source.
๐ Usage in ComfyUI
Place the .safetensors file in your ComfyUI/models/text_encoders/ folder.
In your LTX-2 workflow (DualCLIPLoader), select this model.
Recommended Dtype: Set weight_dtype to fp8_e4m3fn (the critical layers remain BF16 automatically).
Prompting Tip: This model reacts very well to "action verbs" at the very beginning of the prompt. It requires less CFG scale than standard models to produce motion.
๐ Technical Background
Why Heretic-X for Video?
LTX-2 (especially the Dev version) often suffers from "motion collapse" (frozen video) when the text embedding is too neutral. Heretic-X provides a higher variance in its embeddings.
Why this Quantization?
Standard FP8 conversions of Heretic models often result in "weird" artifacts because the aggressive weights clip during quantization. By protecting the last 4 layers (44-47) in BF16, we ensure that the final instructions sent to the Video Transformer retain their high-precision spatial alignment, preventing the "uncanny valley" effect often seen in dynamic clips.
Credits
Base Model: Google Gemma 3
Heretic Fine-tune: LastRef
Optimization & Architecture Fixes: Sikaworld
v1.0
Gemma-3-12B-it-Abliterated (Sikaworld High-Fidelity Edition)
This is a specialized, fully uncensored (abliterated) text encoder for the LTX-2 audiovisual model.
While standard FP8 conversions often lead to "frozen" videos, facial drifting, or anatomical asymmetry in Image-to-Video (I2V) workflows, this version was surgically optimized to preserve the intelligence and stability of the original model.
๐ Key Features
Uncensored Freedom: Based on the abliteration technique by Maxime Labonne. This model follows complex or "sensitive" prompts without refusals, ensuring a strong vector signal for high-motion video generation.
High-Fidelity Layer Protection: Unlike radical FP8 quants, this version uses a Mixed Precision Strategy. Critical input layers (0-1) and final output layers (44-47), as well as all LayerNorms and Biases, are kept in BF16. This specifically fixes the "face shifting" and "asymmetry" issues common in LTX-2.
True Standalone (.safetensors): Includes the embedded spiece_model tensor. It works as a single-file plug-and-play solution in ComfyUI without requiring external tokenizer.model files.
FP32 Sourced: Converted directly from the original 47GB FP32 shards to ensure maximum rounding precision during the FP8/BF16 hybrid conversion.
๐ Usage in ComfyUI
Place the .safetensors file in your ComfyUI/models/text_encoders/ folder.
In your LTX-2 workflow, use the DualCLIPLoader or the specific LTXV Text Encoder Loader.
Tip: For best motion results, leave the negative prompt empty and focus your positive prompt on actions and dynamics.
๐ Technical Background
Standard 8-bit quantization often "muffles" the subtle signals needed for temporal consistency in video models. By protecting the "navigation" layers (the beginning and end of the 48-layer stack) in BF16, this encoder provides a much "louder" and more stable movement command to the LTX-2 Transformer.
Credits
Abliteration: mlabonne
Optimization & Quantization: Sikaworld
Description
FAQ
Comments (5)
Any workflow?
u can use the comfy ui LTX-2 templates and replace the text encoder
This one can replace gemma-3-12b-it-qat-q4_0-unquantized ? I mean, in default workflow it won't give the tokenizer error?
this TE will not work with the gemma loader (shards loader) but the LTX workflows are anyway updated and the node this TE fits in is the dualcliploader!
Is this working with 2.3 model?