Goddess Project

--Formerly Uncensored Females--

Standalone Checkpoint - Goddess works in FORGE only

DO NOT LOAD a separate VAE, TE, or CLIP unless using GGUF

This version is a mixed precision Flux Dev model, with limited UNET changes to allow for feminine anatomy.

Run this model in automatic FP16 Lora mode NOT NF4
This model is full precision in BF16 UNET with mixed precision (NF4) on the TE blocks

This model fits in a 24GB card and could be run in GPU only mode as such
High speed BF16 with a slightly lower prompt accuracy compared to the 33GB full model

Links (For GGUF ONLY)

Text Encoders
Updated CLIP - Standard CLIP-L - FP8 CLIP-L -- ** Version Comparison **
VAE (AE.safetensors)

Per the Apache 2.0 license FLAN is attributed to Google

This model is a training using many individuals with known ages and 2257 forms, it has also been merged to try and ensure that no known individuals can be reproduced. However FLUX seems to like to learn faces even with less then 10% data rather then merge them into a new face.

Description

FAQ

Comments (9)

332991816250Oct 27, 2024

CivitAI

请问，你是用什么软件制作flux模型的？我不想只限于lora制作！

Felldude

Author

Oct 27, 2024

它适用于 LORA 的Tā shìyòng yú LORA de

velantegNov 6, 2024

CivitAI

How its possible at all to use 15 Gb size model on 3050 8 Gb? Some weird lies in description.

Felldude

Author

Nov 6, 2024· 3 reactions

A few months ago those would have been "lies" as you say - You need to look up CPU offloading and block management

AkalabethNov 6, 2024· 3 reactions

I can run Flux.1-Dev Fp32 (22Gb) on my RTX 3070 8Gb without any problems. It's just slow (3-4 minutes to generate 1 image)

blobby99Nov 6, 2024

Most of what you are told about AI is by enthusiasts with NO tech understanding of coding, maths, or computer architecture. With an LLM, your bottleneck is memory bandwidth, and you want the language model in VRAM. But with image generation from a BIG diffusion model, you won't even get ONE iteration per second, so the cost of constantly moving the model from system RAM to VRAM in chunks, per iteration, is not a bottleneck. Your SYSTEM RAM works around 50GB/sec, 16-channel PCI-express v5 can match this, but your pcie v4 will be 32GB/s, still fast enough for the entire diffusion model per second.

Do NOT think you understand how all this works- educate yourself!

Felldude

Author

Nov 7, 2024

@blobby99 Most of the code is written by AI in and very un-optimized, but with billions of lines of code its not practical to optimize any more then its practical to hand caption that many images.

The clip and now the T5 don't need to be loaded into memory it usually only takes seconds to load them in and out, having the entire diffusion model in VRAM might save you seconds or minutes depending on how many times you change the prompt.

amazingbeautyNov 7, 2024

just use the gguf and you will wonder

amazingbeautyNov 7, 2024

but really good question

Checkpoint

Flux.1 D

by Felldude

Download (Beta) View on CivitAI