A fully offline, portable desktop application for generating high-quality image captions using the UNCENSORED Qwen3-VL 8B vision-language model. Built with a professional PyQt6 dark-themed GUI, GGUF quantized model inference via llama-cpp-python, and full CUDA GPU acceleration.
Designed for AI artists, dataset curators, CHROMA, PONY, QWEN, ZIT, ZIB, Stable Diffusion / Flux trainers who need accurate, customizable captions for their image datasets.
Up-to-date, highly accurate, less hallucinations, better visual encoder, FASTER THAN JoyCaption Beta One & ANY OTHER JoyCaption PORTABLE VERSION.
THIS IS BUILT BY A REAL TECHNICAL ENGINEER w/ 16 YRS OF EXP.
Download link below for complete details and application.
https://github.com/GitDonkeyHubbed/qwen3vl-captioner
🚀 What's New in V1.2.0
This release brings a major overhaul to how captions are generated, focusing on accuracy, anatomy, and detail over "storytelling" fluff.
🏥 Clinical Precision Mode
We've completely rewritten the prompts for all models (Flux, Stable Diffusion, Pony, etc.). Instead of "cinematic" or "moody" descriptions, the engine now focuses on:
Physical Reality: Exact shapes, textures, and spatial relations.
Accurate Anatomy: Detailed descriptions of bodies and poses without euphemisms.
Objective Detail: "Horses through" the image content, listing exactly what is there.
🔞 Uncensored / Adult Detail Option
A new "Uncensored / Adult Detail" checkbox in the settings. When enabled, this injects explicit instructions to describe all content (including nudity and adult themes) with full anatomical accuracy, bypassing standard safety refusals. Essential for high-quality dataset training.
📦 Portable Release
This version is fully portable. Models are now detected in the application folder, making it easier to share and install.
✨ Key Features
Clinical Precision: Using anatomically accurate, objective language instead of "creative writing" style. Designed for training, not storytelling.
Universal "Edit" Mode: Full control via the Edit button to handle any prompt format (JSON, XML, Booru) without needing complex hardcoded "modes".
Lean Architecture: Focused on speed and simplicity. No bloat, just tools that work.
Multi-Model Presets: Pre-configured formats for Flux 1 & 2, Stable Diffusion, Pony (SDXL), Z-Image, and more.
Drag & Drop: Drop images or entire folders directly into the app.
Batch Processing: Caption thousands of images automatically.
Smart Model Handling: Native GGUF support with auto-downloading.
Hardware Monitoring: Real-time GPU VRAM usage display.
Safety Controls: Toggle between "PG" and fully "Uncensored" XXX modes.
Auto-save & Cancel operation anytime
Drag & Drop Enabled
Description
🚀 What's New in V1.2.0
This release brings a major overhaul to how captions are generated, focusing on accuracy, anatomy, and detail over "storytelling" fluff.
🏥 Clinical Precision Mode
We've completely rewritten the prompts for all models (Flux, Stable Diffusion, Pony, etc.). Instead of "cinematic" or "moody" descriptions, the engine now focuses on:
Physical Reality: Exact shapes, textures, and spatial relations.
Accurate Anatomy: Detailed descriptions of bodies and poses without euphemisms.
Objective Detail: "Horses through" the image content, listing exactly what is there.
🔞 Uncensored / Adult Detail Option
A new "Uncensored / Adult Detail" checkbox in the settings. When enabled, this injects explicit instructions to describe all content (including nudity and adult themes) with full anatomical accuracy, bypassing standard safety refusals. Essential for high-quality dataset training.
📦 Portable Release
This version is fully portable. Models are now detected in the application folder, making it easier to share and install.
✨ Key Features
Clinical Precision: Using anatomically accurate, objective language instead of "creative writing" style. Designed for training, not storytelling.
Universal "Edit" Mode: Full control via the Edit button to handle any prompt format (JSON, XML, Booru) without needing complex hardcoded "modes".
Lean Architecture: Focused on speed and simplicity. No bloat, just tools that work.
Multi-Model Presets: Pre-configured formats for Flux 1 & 2, Stable Diffusion, Pony (SDXL), Z-Image, and more.
Drag & Drop: Drop images or entire folders directly into the app.
Batch Processing: Caption thousands of images automatically.
Smart Model Handling: Native GGUF support with auto-downloading.
Hardware Monitoring: Real-time GPU VRAM usage display.
Safety Controls: Toggle between "PG" and fully "Uncensored" XXX modes.
Auto-save & Cancel operation anytime
Drag & Drop Enabled