Changes light, dark with negative values and bright with positive values. Tested between -2 and 2.
Description
First version
FAQ
Comments (11)
Nice! will need to try this today. Question is, will it work with the z-detail slider? and brighten those up a bit?
Yes
You absolutely nailed the training on this one. I see no hint of lower contrast/higher saturation on opposite ends. I'm very impressed. How did you pull it off?
Thank you! Model layer selection.
@alcaitiff Do the same layer selections result in avoiding that issue on every lora? Or does that have to be figured out for every slider lora individually? If it's the same for every slider lora, which layers work for you?
@Jellai No, each lora will need different layers.
Main DiT Blocks (layers.N.*)
These are the core repeated transformer blocks (likely ~30–40 layers, numbered 0 to higher, e.g., 0–12 visible in partial index, continuing further).
Each block uses adaptive layer normalization (adaLN) conditioned on timestep/noise level for diffusion scheduling.
adaLN_modulation.0.weight/bias: Linear layer that produces scaling/shifting modulation factors from the timestep embedding, applied to normalize features (similar to AdaLN-Zero in DiT models).
attention. (to_q, to_k, to_v, to_out, norm_q, norm_k)*: Self-attention mechanism on the single-stream sequence (text + caption + image tokens). Computes queries, keys, values; normalizes Q/K separately; projects output.
attention_norm1/attention_norm2.weight: Layer norms before/within attention (pre-norm style).
feed_forward. (w1, w2, w3)*: Gated feed-forward network (likely SwiGLU-style with three weights for gating and projection).
ffn_norm1/ffn_norm2.weight: Layer norms before/within the FFN.
Each layers.N block processes the unified input sequence with conditioned attention and feed-forward, progressively denoising/refining the latent representation.
Caption Embedding (cap_embedder.* and cap_pad_token)
cap_embedder: Likely a small projector or embedder for visual semantic tokens (high-level captions or "cap" tokens providing additional visual guidance).
cap_pad_token: Special padding token embedding for aligning variable-length caption sequences.
These add structured visual semantics to the single-stream input, enhancing prompt understanding without separate streams.
Context Refiner (context_refiner.0 and context_refiner.1)
A small pre- or post-processing module (2 blocks shown) with similar structure to main layers: attention (to_q/k/v/out, norm_q/k), norms, and feed-forward (w1/w2/w3).
Likely refines the concatenated context (text + caption tokens) before feeding into the main DiT, improving cross-modal alignment or prompt reasoning in the single-stream setup.
Input Projection (all_x_embedder.*)
Projects the initial noisy image latents (and possibly concatenated inputs) into the transformer's hidden dimension.
Final Output Layer (all_final_layer.*)
Post-DiT projection: adaLN_modulation for final conditioning, followed by linear to map back to VAE latent channels for decoding into the image.
@alcaitiff Looks like I have a lot to research. I started messing with layers recently in ComfyUI, but this is more complex than that. Thanks a lot for sharing your experience.
@Jellai pfft, noob. you can't even defibriscale a compartmentalized oscillatomitron without knowing these basics bruh
Lighting control !!!! nice
I loved how well this LoRa works; it not only noticeably improves the lighting but also the sharpness and image quality. It's a great piece of work. Now I'd like one that can zoom in and out, like a zoom slider. Thanks.
发现一个问题:权重为正数时,画面会变得模糊和很多噪点。权重为负数时却能够提升画面的清晰度和细节?












