marduk191's Flux.1 collection - CivArchive (CivitAI Archive)

marduk191's Flux.1 collection - flux1-dev_8x8_scaled

I compiled a little collection of flux.1 models. there are fp8 models with fp8 t5 and fp8 models with fp16 t5 for both dev and schnell. Single files for use with the regular checkpoint loader. there are also fp16 models available now. all models have clip, t5, and vae baked in. THESE ARE ALL STOCK FLUX.1

Forr flux kontext, see here https://civarchive.com/articles/16348/flux1-kontext-dev-quantized-models-available

These all use bf16 upcasting, use the appropriate flags if you are tuning on gtx cards for some reason.

Unified single file versions of flux.1 for comfyui. All files have a baked in VAE and clip L included:

flux.1_dev_8x8_e4m3fn-marduk191.safetensors is Flux.1 Dev quantized to 8 bit with an 8 bit T5 XXL encoder included.

flux.1_dev_fp8_fp16t5-marduk191.safetensors is Flux.1 Dev quantized to 8 bit with an 16 bit T5 XXL encoder included.

flux.1_schnell_8x8_e4m3fn-marduk191.safetensors is Flux.1 Schnell quantized to 8 bit with an 8 bit T5 XXL encoder included.

flux.1_schnell_fp8_fp16t5-marduk191.safetensors is Flux.1 Schnell quantized to 8 bit with an 16 bit T5 XXL encoder included.

flux.1_dev_16x16-marduk191.safetensors Flux.1 Dev quantized to 16 bit with an 16 bit T5 XXL encoder included.

flux.1_schnell_16x16-marduk191.safetensors Flux.1 Schnell quantized to 16 bit with an 16 bit T5 XXL encoder included.

flux.1_dev_8x8_scaled-marduk191.safetensors is Flux.1 Dev quantized to 8 bit scaled stochastic weights and normalized outlaying alphas. It uses an 8 bit scale dstochastic (tag limited to avoid loss) T5 XXL encoder included.

Workflow examples are available here: SOON

Repository is here: https://huggingface.co/marduk191/Flux.1_collection/tree/main

Discord: https://discord.gg/s3kj9VqpKc

Tips welcome: https://ko-fi.com/marduk191

Description

Added flux1-dev_8x8_scaled. both unet and t5 are converted to fp8 scaled weights and should be a sweet spot. This replaces fp16 static weights if i didn't mess up the alphas lol. I am pretty sure it worked out.

FAQ

Comments (6)

showbert7Jun 4, 2025· 5 reactions

CivitAI

Welcome back Brother

marduk191

Author

Jun 4, 2025· 1 reaction

Thanks. We're kind of just testing things that will apply to chroma with this one, but it's a decent add for 10-12gb people. the scaled algorithm seems to hold extremely good quality. little steps toward bigger things lol

showbert7Jun 4, 2025

@marduk191 Yeah, Chroma is pretty cool and generates images quickly. But I think the SVDQ-int4-flux.1-dev model could also be a good option. It uses 4-bit compression, so it consumes less memory and runs faster, while still maintaining high image quality. It's a good choice for systems with limited resources.

marduk191

Author

Jun 5, 2025· 1 reaction

@showbert7 4 bit is very low quality. And very slow unless you have a 50xx series where they added hardware support. 8 bit scaled with alpha normalization for outlayers is the best i could come up with for compatibility. And certainly the old scaled method lost quality, this keeps as much as i know how without inflating it too large. just have to use the kj nodes loader since comfy doesn't support scaled acceleration with his unet or checkpoint nodes. i'll do a beginner stock flow for chroma and add the flux version for optimal loader settings.

showbert7Jun 5, 2025

@marduk191 Totally fair points, appreciate the detailed breakdown 🙏

Yeah, 4-bit definitely has its trade-offs. I was mostly thinking in terms of VRAM limits on older cards, where it sometimes helps squeeze things in. But you're right — without proper hardware support like in 50xx cards, performance can really tank.Your 8-bit scaled + alpha norm idea sounds like a solid middle ground honestly. And yeah, the old scaled method did lose sharpness, so I'm curious to try your updated setup.🙂

marduk191

Author

Jun 6, 2025· 1 reaction

@showbert7 this uses the newer out of bounds normalization, if it's still bad then it's still bad lol. it's what we have at the moment. 4 bit, i'll prolly never support. These models run at around 5sec/it on a 3060 12, any slower and you are not gaining anything and should use a model more suited for those cards. This is why i set the default upcasting to bf16 instead of fp16 on the originals, they will refuse to run on gtx cards unless you manually patch that in your local install. Even 8 bit normals are really pushing it because of quality degradation. scaling increases that quality, but it is what it is at that point unless someone comes up with a great trick. def faster than unaccelerated gguf on speed lol. As far as the release, the visuals are pretty decent, i haven't had time to do a large sample size on comparisons, i'm sure some redditor will do it to have something to yell about lol. k, on to cublas patching since he supports that now, we'll see if that helps speed lol

Checkpoint

Flux.1 D

by marduk191

Download (Beta) View on CivitAI