Image-to-Prompt & Smart Detection — Workflow Guide
38 nodes · 7 groups · 16 unique node types — 92.1% Eclipse nodes
35 Eclipse nodes at the top-level
Built with ComfyUI_Eclipse custom nodes
What Is This?
This workflow is a comprehensive, modular playground for Image-to-Prompt (VLM) generation and Smart Detection built entirely with the ComfyUI_Eclipse custom node suite.
It provides an interactive interface for:
Loading single images or batching directories of files (with video frame extraction support).
Interactively selecting frames/images on the fly in the frontend using the
Image Selector.Running object detection, segmentation, and bounding-box labels using the
Smart Detectionnode.Querying multiple vision-language models (VLMs) and text-only LLMs concurrently using the unified
Smart LM Loader(Smart LML).Generating prompt variations, timeline structures (for video generators like Wan2.1 and LTX), and song lyrics.
Direct interactive chat with custom system prompts and dynamic routing.
The workspace is designed for modularity and wireless routing: groups are self-contained and communicate using Set/Get named channels, meaning you can toggle groups on and off via muting without breaking any visual wiring.
🌒 Unified Smart Loaders & Backends
A key strength of the ComfyUI_Eclipse suite is consolidating fragmented model wrappers and APIs into two primary smart loaders. Instead of installing and configuring a dozen different node packs for Hugging Face, GGUF, Ollama, vLLM, and Docker servers, everything is unified under a standard input interface.
1. Smart Language Model Loader (Smart LML)
Smart LM Loader [Eclipse] serves as the single entrypoint for LLMs, VLMs, and ONNX taggers. It natively handles prompt templates, system instructions, and advanced sampling (temperature, top_p, min_p, Mirostat, repetition penalties).
It supports 8 distinct backends selectable directly from the node:
Transformers (Native local Hugging Face execution): Standard local VLM runs (Florence-2, Qwen2-VL, Pixtral) and text-only models. Supports FP16, BF16, and INT8/INT4 quantization.
GGUF (Local quantized weights via
llama-cpp-python): Running large models locally on lower-VRAM consumer GPUs with high quant efficiency (e.g. Q8_0, Q4_K_M).Ollama (API interface / Local Docker container): Offloading computation to a background Ollama daemon or remote Ollama server.
llama.cpp (Local Docker-based server engine): Standardized containerized CPU/GPU inference.
vLLM (Docker) (High-throughput Docker container): Offloading inference to a vLLM server container for maximum speed and structured batch requests.
vLLM (Native) (Native local Python library): Fast local inference on Linux systems with native vLLM installed.
SGLang (Docker) (High-performance container engine): Rapid structured text generation and blazing-fast decoding.
WD14 Tagger (Local ONNX-based taggers): Running aesthetic classifier models (like SwinV2 or ConvNeXt) to auto-tag anime/general images locally.
2. Smart Detection
Smart Detection [Eclipse] is a unified object detection, text grounding, and segmentation node. It acts as an adapter that unifies two major model families under a single pipeline:
Vision-Language Models (VLMs): Uses Florence-2 or Qwen-VL to run text-grounding queries (e.g., "detect face, hair, blue dress") and converts textual coordinate outputs into binary masks, cropped bounding boxes, and ComfyUI Impact Pack-compatible SEGS.
YOLO Models: Loads YOLO v8, v9, v10, and v11 models for high-speed object detection and precise facial/segment segmentation.
3. Smart Model Loader (Main)
Used in standard rendering workflows (like iGEN ONE) but registered under the same suite. It loads standard Checkpoints, UNets, CLIP text encoders, VAEs, and LoRAs. It contains a template dictionary matching filenames to expected SHA256 hashes and CivitAI AIRs. If a file is missing, it displays an interactive Download button to automatically fetch it directly into your ComfyUI models directory.
Adding Custom Models to the Registry
To add your own custom models to the Smart LM Loader dropdowns, you should edit the local user registry file:
File Path: [registry/user_models.json](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/registry/user_models.json)
Insert your model details under the appropriate backend key (e.g. "transformers", "gguf", "ollama"):
{
"transformers": {
"My-Custom-VLM-Name": {
"repo_id": "username/repo-name",
"family": "VLM",
"has_vision": true
}
},
"gguf": {
"My-Custom-GGUF-Name": {
"repo_id": "username/repo-name-GGUF",
"family": "LLM_TEXT",
"has_vision": false,
"quantizations": ["Q4_K_M", "Q8_0"],
"file_pattern": "model-name-{quant}.gguf"
}
}
}
IMPORTANT: Model names starting with an underscore (e.g. _example_Phi-4) are treated as examples and are ignored. Ensure your custom entry keys do not have a leading underscore.
How It Works — The Basics
Wireless Routing (Set/Get)
Rather than cluttering the canvas with messy visual noodles, the workflow uses Set/Get nodes:
SetNode: Publishes the loaded image as a named stream
REF_IMAGE.GetNode: Subscribers inside the Detection, Image to Prompt, and Timeline Prompts groups retrieve this reference image wirelessly.
Mode Bridges (Single Upload vs. Directory Batching)
The workflow includes a smart toggling mechanism for image sourcing:
Load Image (Metadata Pipe): For drag-and-dropping individual files from your browser/computer.
Load Batch From Folder: For batching directories of images or decoding frames directly from video files.
Mode Bridge Set & Get: Wireless switches that communicate which loading path is active. When you toggle the active path, the
Any Multi-Switchroutes the correct image stream toSet_REF_IMAGEautomatically.
Group-by-Group Reference
1. Load Image
The entry point of the workflow. Exposes two loading methods:
Single Image: [Load Image (Metadata Pipe) [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvImage_LoadImage_Pipe.py).
Batch / Video: [Load Batch From Folder [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvImage_LoadBatchFromFolder.py). Supports listing folders or decoding MP4/MKV video files frame-by-frame.
Interactive Filtering: [Image Selector [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvImage_Selector.py). When batch loading a folder or decoding video, this node pauses the ComfyUI run and renders an interactive grid of thumbnails. You click the frames you want (supports shift-click range and text filtering), click "Confirm", and it resumes to output only that selected subset.
Routing: Uses two
Mode Bridge GetandMode Bridge Setnodes to govern whether single or batch mode is selected. The active image goes toSet_REF_IMAGE.
2. Detection
Performs visual segmentation and object locating:
Subscribes to
REF_IMAGEvia aGetNode.Smart Detection: Uses a model (defaulting to Florence-2) to detect coordinates or segment structures.
Outputs: Renders the isolated region to
Preview Image (DOM), shows the binary segmentation mask inPreview Mask, and dumps textual results (like bounding box text or OCR outputs) toShow Text.
3. Image to Prompt
Generates detailed textual descriptions from your image:
Uses two parallel
Smart LM Loadernodes loaded with vision-language models (e.g. Qwen2-VL-7B-Instruct or Florence-2).NOTE: This parallel setup with two separate nodes is for showcase/demonstration purposes only. In a typical production workflow, multiple sequential operations like captioning and tag generation can be executed inside a single loader node using its built-in multi-task system.
Executes tasks like creating detailed descriptive prompts or extracting comma-separated tagging keys.
Outputs prompts to two separate
Show Textboxes.
4. Timeline Prompts (Wan / LTX)
A dedicated VLM generator for text-to-video prompt engineering:
Uses
Smart LM Loader(VLM mode) to analyzeREF_IMAGE.Generates structured timeline descriptions (e.g. "0s: subject starts sitting; 2s: subject turns head") tailored for video models like Wan2.1 and LTX-Video.
Dumps the structured timelines to a
Show Textbox.
5. Prompt Variations
An LLM generator to brainstorm prompts:
Uses
Smart LM Loader(text-only mode) to take a simple seed prompt (e.g. "a futuristic cyberpunk street") and expand it into 3 creative prompt variations.Dumps the results to a
Show Textbox.
6. Song Lyrics
A creative text generation playground:
Uses
Smart LM Loader(text-only mode) configured to compose structured song lyrics or poetry based on user-supplied themes.Dumps the lyrics to a
Show Textbox.
7. Direct Chat
A general playground to communicate directly with your loaded LLM or VLM:
Exposes three pre-configured system prompts using [String Multiline [Eclipse]](file:///mnt/data/AI/custom_nodes/comfyui_eclipse/py/RvText_Multiline.py) nodes.
Any Multi-Switch: Routes one of the three system prompts to the loader based on your selection (V1, V2, or V3).
Smart LM Loader: Executes the conversation.
Fast Mode Toggle: Lets you customize model options (device, compile, temperature) via simple frontend buttons.
Show Text: Displays the direct chat responses.
Quick Start Guide
TIP: Ensure your Hugging Face, GGUF, or Ollama server credentials are configured in config.json if using Docker-based backends. Local GGUF and Transformers backends run out-of-the-box using models placed in models/llms/ or models/text_encoders/.
How to Get Captions from an Uploaded Image
Drag and drop your image into the Load Image node in the Load Image group.
In the Image to Prompt group, select your preferred VLM from the dropdown on the
Smart LM Loader(e.g.Florence-2-large).Click Queue Prompt.
The generated prompt and tags will appear in the
Show Textnodes inside the group.
How to Detect and Mask Objects (e.g., Face/Hair)
Ensure your image is loaded in Load Image.
Locate the Detection group and enter your search query in the
detection_promptwidget of theSmart Detectionnode (e.g. "face, hair").Click Queue Prompt.
The node will locate the facial/hair boundaries, isolate them, and display the result in the
Preview Image (DOM)and the binary mask in thePreview Masknode.
How to Run Batch Image Tagging
In the Load Image group, set the bridge settings to Folder Mode.
Enter the path to your folder in the
directorywidget ofLoad Batch From Folder [Eclipse].Set the
Image Selectorto bypass if you want to automatically process all images in sequence, or leave it active to manually select files when execution pauses.Run the queue. The workflow will process each file sequentially through the active groups.
Custom Node Packages Used
ComfyUI_Eclipse — Handles unified smart loaders (
Smart LM Loader,Smart Detection), directory list-loading (Load Batch From Folder), visual filtering (Image Selector), wireless routing (Set/Get), and text previews (Show Text,Preview Image (DOM),Preview Mask).
Explore, test, and build beautiful language-guided pipelines. 🌒
