CivArchive
    Flux/Zit 2x speedup for GTX 1080ti/ Pascal (ForgeUI) - v1.0
    NSFW
    Preview 129764127

    This is a custom drop-in CUDA kernel designed to bring older Pascal GPUs back to life when running heavy FP8 models in ForgeUI.

    Since the Pascal architecture lacks Tensor Cores, PyTorch defaults to a painfully slow fallback path when handling FP8 weights. This mod intercepts that process, converting FP8 to INT8 on the fly and computing it using native __dp4a instructions.

    The Result: Roughly 2x faster generation speed with zero visual quality loss.

    And Pytorch profiler :
    FP8 Stock

    INT8

    Tested Setup

    • Hardware: Titan X Pascal (sm_61)

    • Models tested: Z Image Turbo (ZiT) and Flux 2 Klein 9B (both in FP8)

    • UI: ForgeUI (Neo branch). It might work on the original main branch if it supports these models, but Neo is fully tested.

    IMPORTANT WARNING

    Do NOT use this mod if you are on a Turing, Ampere, Ada, or newer GPU (RTX 20xx / 30xx / 40xx). Starting with Turing, NVIDIA introduced Tensor Cores, meaning your native hardware path is already significantly faster than this implementation. This kernel is strictly a rescue patch for Pascal (sm_61) architecture!

    Installation Instructions

    No compilation required. Just follow these steps:

    1. Go to your ForgeUI backend folder.

    2. Find operations.py and make a backup of it (just in case you want to revert later).

    3. Replace the original operations.py with my modified version.

    4. Inside the backend folder, create a new folder named ext.

    5. Inside ext, create another folder named zimage_ext (The path should look like this: backend/ext/zimage_ext/).

    6. Drop the provided library file (.pyd) into the zimage_ext folder.

    That's it! Restart your ForgeUI. Any FP8 models will now automatically convert and compute in INT8, giving your 10-series card a massive speed boost.

    Description

    Initial release

    Other
    Flux.2 Klein 9B

    Details

    Downloads
    24
    Platform
    CivitAI
    Platform Status
    Available
    Created
    5/6/2026
    Updated
    5/13/2026
    Deleted
    -

    Files

    fluxZit2xSpeedupForGTX_v10.zip

    Mirrors