This is a simple, portable, standalone & multiplatform Python GUI application for generating high-quality natural language captions for images using the JoyCaption Beta One model (LLaVA-based, GGUF format). This is perfect for preparing training dataset for SD 1.5, SDXL 1.0, MagicWAN Image v2, QWEN, HunyuanImage-2.1, HiDream, KREA, Chroma, Z-Image Turbo, Z-Image Base, Flux.2 Klein and Flux.1 fully local, no internet required after initial setup to use this. As the main application is written in PyQT6 you can easily run this on Mac, Linux & Windows. I wrote this in such a way that even if you don't have powerful GPU but have a decent processor and RAM combo or good amount of Unified Memory & storage on Mac you can still use it with CPU. (I don't use a Windows machine so couldn't provide you a batch script for Windows but if you upload this readme along with any of the run script, one for Mac or one for Linux, to ChatGPT or GROK and ask to create similar run batch script for Windows it can provide you one). As the python code and shell scripts are clearly Open Source you can surely modify this app to fully utilize a powerful AMD / Nvidia GPU or High end Mac Graphics Cores (if you have one) if you have proper coding skills. Feel free to do so and share your good work!
Features
--------
• This application uses very compact GGUF version of JoyCaption, you have two model options:
- Q4_K_M (~4.6 GB) – fast, good quality (default)
- Q8_0 (~8 GB) – highest quality, slower
• Required Vision Projector (mmproj ~0.82–0.88 GB)
• Caption styles:
- Flux Natural (Detailed) ← most popular
- Flux Natural (Brief)
- SDXL / SD Tags (comma-separated prompt style)
• Content filtering modes:
- PG Mode (no sexual content)
- Vulgar/Blunt/NSFW Mode
• Optional trigger word support (e.g. ohwx, masterpiece – before or after caption)
• Image scaling options (optional):
- Do not scale (original folder + .txt)
- Scale to 512px or 1024px (short side) → saved in output/XXX_scaled/
• Single image or batch folder processing
• Clean dark terminal-style log window
• Progress bar + stop button
• macOS & Linux launch scripts (double-click friendly)
Folder Structure
----------------
JoyCaption_Portable_v1.0/
├── run_linux.sh ← double-click to launch (Linux)
├── run_mac.sh ← double-click to launch (macOS)
├── JoyCaption_Portable.py ← main application
├── requirements.txt
├── .venv/ ← auto-created virtual environment
├── models/ ← place GGUF files here
│ ├── Llama-Joycaption-Beta-One-Hf-Llava-Q4_K.gguf
│ ├── Llama-Joycaption-Beta-One-Hf-Llava-Q8_0.gguf
│ └── llama-joycaption-beta-one-llava-mmproj-model-f16.gguf (required!)
└── output/ ← created automatically when scaling
├── 512_scaled/
└── 1024_scaled/
Quick Start
-----------
1. Download models (only once either using the app itself or manually if you prefer)
From: https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main
Recommended files:
File Size Recommended
----------------------------------------------------------------------------
Llama-Joycaption-Beta-One-Hf-Llava-Q4_K.gguf ~4.6 GB Yes (fast)
Llama-Joycaption-Beta-One-Hf-Llava-Q8_0.gguf ~8 GB Best quality
llama-joycaption-beta-one-llava-mmproj-model-f16.gguf ~0.88 GB REQUIRED
Place all files into the models/ folder.
** Note can download your models within this tool itself on first run
2. Install dependencies (only first time)
Linux / macOS:
bash run_linux.sh # or run_mac.sh
The script will:
- create .venv if missing
- install requirements
- launch the GUI
3. Run the app
• Double-click run_mac.sh (macOS) or run_linux.sh (Linux)
• Or manually: source .venv/bin/activate && python JoyCaption_Portable.py
4. Usage
- Select model (Q4 is fastest)
- Choose style (Flux Natural Detailed usually best)
- Optional: PG/Vulgar mode, trigger word
- Click Single Image or Batch Folder
- Captions saved as .txt files next to images (or in output/ if scaling)
Model Download Links (direct links)
-------------------------------------
Tips
----
• Q4_K_M is 2–3× faster with only minor quality drop — use it for most work
• Flux Natural (Detailed) gives the most natural-looking training captions
• Avoid watermarks/logos in images (prompt already forbids mentioning them)
• Trigger words like ohwx, masterpiece, best quality help with conditioning
• Scaled folders make it easy to build 512×512 or 1024×1024 datasets, but you can opt for using existing folder
Credits
-------
• JoyCaption Beta One model by fancyfeast / community
• GGUF conversions by concedo (KoboldCpp developer)
Enjoy captioning!