CivArchive
    Global_anatomy_Zimage Base BF16 - v1.0
    NSFW
    Preview 121462580
    Preview 121462706
    Preview 121462703
    Preview 121462702
    Preview 121462707
    Preview 121462705
    Preview 121462704
    Preview 121462701
    Preview 121530738
    Preview 121530867
    Preview 121530863
    Preview 121530861
    Preview 121530862
    Preview 121530864
    Preview 121530866
    Preview 121530869
    Preview 121530871
    Preview 121530868
    Preview 121530870
    Preview 121530865

    Hello there, finally attacking Z image base full finetune. it is complexe but interesting.

    This model add nudity, anatomy, and variations into the model, helping with proportions posing and angle, adding concepts like genitalias, and also give some push in inbetween body shapes and expressions, ranging from chubby, slender, soft, mucular, realistic proportions, to extreme proportions.

    This model, can make now, nude men, nude feminine men, nude transwomen, and nude womens.

    call them like that and experiment, there is not real trigger words, the captioning was donne, with Kimi K2.5 for the natural language part, and I added a tag line with Joytag, tout have some repeating concepts to help the texte encoder to grab some recurrent elements. so prompt in a mix of natural and tags.

    Should be able to do a lot of various styles.

    Shift 5, CFG 4, eular Beta, and 40-50 steps for seeing good results.

    It work even better with the lora distil fun 4 steps. with CFG 1, 4 or 6 steps

    Still imperfect obviously, V2 is in the back of my mind, with larger Dataset, some better selections in mind, and a deeper, more hard funel technics and some test about BF16 precisions.

    still adjusting. Always learning

    (Human wrote part)

    Z-Image Base Full Finetune – Technical Notes (Experimental Observations)

    This model was trained as a full finetune on Z-Image Base (BF16) using Musubi Tuner.

    Training was performed on an H100 80GB, at 1024 resolution, with:

    Batch size: 9

    Gradient accumulation: 1

    Full BF16

    Flash attention enabled

    Gradient checkpointing

    Bucketed dataset (many varied buckets)

    Dataset of 2100 Images, with very various styles, poses, angles and variations in physiologies.

    The following notes are based on practical experimentation.

    They are not official documentation, and should be considered empirical observations that worked in this setup.

    1. Z-Image Base behaves differently from SD models

    Z-Image Base uses a DiT transformer architecture and flow-based timestep sampling.

    Compared to SD1.5 / SDXL:

    Structural changes take longer to appear.

    The model shows strong internal coherence.

    It resists abrupt shifts.

    Visible improvement often depends heavily on sampling settings.

    It feels layered — structural changes must propagate across multiple refinement stages.

    Z-Image Base is harder to move, but very stable once shaped.

    2. About shift (Flow Shift) During Training

    Using:

    --timestep_sampling shift

    --discrete_flow_shift X

    modifies how timesteps are distributed during training.

    From experimentation:

    Higher shift (≈ 2.5+)

    Emphasizes global structure.

    Useful for early structural imprinting.

    Mid shift (~2.0–2.2)

    Appears to consolidate structure.

    Balances geometry and detail.

    Lower shift (1.5–1.7)

    Seems to refine fine details.

    Useful for finishing phases.

    This suggests a staged approach:

    Start higher for structure, progressively lower for refinement.

    This is an experimental strategy — not an official rule.

    3. Optimizer Choice (Adafactor vs AdamW)

    In this setup, Adafactor performed better than AdamW for full finetuning.

    Observed behavior:

    Lower VRAM usage

    Larger stable batch size (9 at 1024 on H100)

    More stable long-phase convergence

    Example configuration used:

    --optimizer_type adafactor

    --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False"

    --lr_scheduler constant_with_warmup

    Again, this is empirical — other setups may vary.

    4. Learning Rate Funnel Strategy

    A progressive reduction strategy was used:

    Early phase: higher LR for structural change

    Mid phase: moderate LR for stabilization

    Final phases: very low LR for micro-refinement

    The core idea:

    Move the structure first.

    Refine later without breaking global coherence.

    Z-Image Base appears to benefit from staged training rather than a flat learning rate schedule.

    5. Dataset and Bucketing

    The dataset was:

    1024 resolution aligned with Z-Image Base

    Fully bucketed

    With many varied aspect buckets

    Multi-distribution (varied morphologies)

    Using many buckets helped:

    Preserve structural consistency

    Avoid overfitting to a narrow framing

    Maintain prompt flexibility

    6. Sampling Matters More Than Expected

    Z-Image Base can look soft or “vaporous” under weak guidance.

    Under stronger guidance (e.g. 4.0) and sufficient steps (e.g. 50):

    Structural improvements become significantly clearer.

    Anatomical refinement is more visible.

    Prompt conditioning becomes stronger.

    When evaluating finetunes:

    Use consistent seeds

    Test guidance 3–5

    Try multiple flow_shift sampling values (e.g. 3 and 5)

    Compare phases side-by-side, and Loss is not that telling, just watch it did not spike, but you will not have deep dives down, your best bet is sampling between phases of training.

    The model’s internal changes may not appear under weak sampling settings.

    7. Important: These Are Hypotheses

    Everything above is based on hands-on experimentation with:

    Musubi Tuner

    Z-Image Base (BF16)

    H100 80GB

    1024 resolution

    Large batch (9)

    Progressive shift funnel

    Z-Image Base is complex.

    Different datasets, hardware, or goals may respond differently.

    These notes should be treated as:

    Practical observations

    Not universal truth

    A starting point for experimentation

    This finetune aimed to preserve:

    Z-Image Base’s native photorealistic grain

    Structural coherence

    Prompt responsiveness

    Stability under guidance

    If you experiment further with Z-Image Base,

    structured training and careful sampling evaluation seem essential.

    (This note has been made by Ai, to avoid confusion, I am not a nativ speaker, but I reread it and approves and take the responsability of this experience return)

    Description

    the first one, not the last.

    FAQ

    Checkpoint
    ZImageBase

    Details

    Downloads
    237
    Platform
    CivitAI
    Platform Status
    Available
    Created
    2/18/2026
    Updated
    4/27/2026
    Deleted
    -

    Files

    globalAnatomyZimage_v10.safetensors

    Mirrors