# ComfyUI-Prompt-Calibratror A **fully local** prompt calibration loop for ComfyUI. A vision-language model (Qwen3-VL) judges how close a *generated* image is to a *reference* image and returns a structured score + per-axis difference analysis, which is used to **calibrate the prompt-generation method** ([ComfyUI-Prompt-Builder](../ComfyUI-Prompt-Builder)) until the generated image matches the reference. > Full design rationale, controller options, and VLM-as-judge variance mitigations > are in **[docs/METHODOLOGY.md](docs/METHODOLOGY.md)**. The controller is an **external > CLI agent** that drives ComfyUI via its HTTP API — see **[docs/AGENT_LOOP.md](docs/AGENT_LOOP.md)**. ## Nodes & tools | Component | What it is | |---|---| | `Qwen3-VL Image Judge (Calibrator)` | scores generated vs reference, writes analysis to disk for the agent | | `SxCP External Prompt (Receptor)` | stable injection point; the agent sets `prompt/negative/seed` here per queue | | `agent_bridge.py` | one CLI call = one iteration (inject → `POST /prompt` → wait → print analysis JSON) | ## The "vllm node": `Qwen3-VL Image Judge (Calibrator)` The core node (`nodes/qwen_judge.py`). It reuses the standard transformers Qwen3-VL inference plumbing (same approach as [ComfyUI-QwenVL-MultiImage](https://github.com/hardik-uppal/ComfyUI-QwenVL-MultiImage) — the recommended reuse base) but **forces strict JSON output** so an automated loop can act on it. **Inputs** | name | type | default | notes | |---|---|---|---| | `reference_image` | IMAGE | — | the target | | `mode` | compare / describe | compare | `describe` = first pass over the reference only → caption + target spec (seeds the prompt). `compare` = score ref vs generated | | `profile` | general / oral / penetration / handjob / solo | general | **analysis profile** — act-specialized axis set; the act-critical axes are distance/proximity-aware (e.g. `mouth_genital_distance`) so magnitude isn't hidden behind a coarse label | | `generated_image` | IMAGE (optional) | — | the candidate to score (required for `compare`, ignored for `describe`) | | `model_select` | dropdown | 4B local bf16 | curated **transformers** judges with VRAM hints: 4B local (bf16 ~9GB / fp8 ~5GB), 8B (bf16 ~17GB), 30B-A3B (nf4 ~18GB, slow). Auto-downloads on first use | | `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`30b-a3b`/`8b`/`4b`). Empty = use `model_select` | | `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | applies to a manual `model_path`; presets carry their own precision | | `axes` | STRING | "" (empty) | **override** the profile's axis set with a custom comma/newline list; empty = use `profile` | | `max_new_tokens` | INT | 1024 | | | `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable | | `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias | | `keep_loaded` | BOOL | true | cache weights across loop iterations | | `auto_download` | BOOL | true | if `model_path` is a repo id/alias and not local, fetch it from HF into `models/prompt_generator/` | **Auto-download:** set `model_path` to `30b-a3b` (alias) or any `org/name` repo id and leave `auto_download` on — the node snapshot-downloads it on first run (into ComfyUI's `models/prompt_generator/`) and reuses the local copy afterward. Local paths and the default skip download entirely. **Outputs** | name | type | use | |---|---|---| | `overall_score` | FLOAT 0..1 | compare: mean verdict (computed here, not by the model). describe: `1.0` placeholder | | `axis_scores_json` | STRING (JSON) | compare: per-axis `{verdict, ref, gen}` (verdict = match/partial/mismatch). describe: `{axis: value}` | | `analysis` | STRING | compare: header (`overall, N mismatches`) + axes worst-first (`VERDICT ref:[…] gen:[…]`). describe: the `caption` | | `raw` | STRING | raw model output (both passes if `swap_eval`) | | `report_path` | STRING | path to the written `calib_.json` (carries `mismatch_count`) | ## Install ```bash cd /media/p5/Comfyui/custom_nodes ln -s /media/p5/ComfyUI-Prompt-Calibratror . # or git clone /media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt ``` The node defaults to the **huihui-ai Qwen3-VL-4B-Instruct abliterated** weights already converted at `/media/p5/qwen3vl_4b_abliterated_comfy_convert/` so it runs out of the box (the abliterated/uncensored variant won't refuse to analyze adult imagery, which would otherwise break the loop). **Pick the judge from `model_select`** (transformers / safetensors, auto-downloaded): the **8B** abliterated at `bf16` (~17 GB) is the recommended intermediate — fast and clearly better than the 4B. The **30B-A3B** (nf4 ~18 GB) is higher quality but slow (the `nf4`/bitsandbytes path is the bottleneck, not the model). `model_path` overrides the dropdown for anything else. **GGUF-only models are not in the dropdown.** Newer uncensored builds like [`HauhauCS/Qwen3.5-9B-Uncensored`](https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive) and [`HauhauCS/Qwen3.6-35B-A3B-Uncensored`](https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive) ship as **GGUF + mmproj** only — this node is transformers-only, so run those in a dedicated GGUF node ([1038lab/ComfyUI-QwenVL](https://github.com/1038lab/ComfyUI-QwenVL) or KLL535 Simple-Qwen3-VL-gguf) and feed their text output into the loop. See [docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl). ## Loop sketch ``` Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge ▲ │ └──────── knob overrides ◀── Controller ◀── overall_score + diff ┘ ``` Use the Prompt-Builder **For-Loop Start/End + Accumulator** nodes to drive iterations and route `overall_score` into the stop condition. Controller options (greedy hill-climb → black-box optimizer → LLM-in-the-loop) are in the methodology doc. ## End-to-end loop 1. Run ComfyUI with `--listen`, install this node pack, put your reference at `ComfyUI/input/reference.png`. 2. **First pass (describe):** the judge looks at the reference alone and emits **one canonical scene description** (coherent paragraph + per-axis target spec) to seed the prompt *and* anchor the loop: ```bash python agent_bridge.py --mode describe --workflow workflow/workflow_describe_api.json \ --run-tag seed --analysis-dir /media/p5/Comfyui/output/calibrator ``` 3. **Compare loop:** load `workflow/workflow_api.json` (SDXL `waiIllustriousSDXL_v160` example — swap the checkpoint for Flux/Krea as needed) and iterate, following `docs/CALIBRATION_POLICY.md`. Pass `--ref-desc-file` so compare anchors on the canonical reference (the `ref` side stays fixed; only the generated image is re-read each turn): ```bash python agent_bridge.py --workflow workflow/workflow_api.json \ --prompt "" \ --ref-desc-file /media/p5/Comfyui/output/calibrator/calib_seed.json \ --run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator ``` stdout = the analysis JSON (`{verdict, ref, gen}` per axis) → agent steers toward `ref` → next iteration. ## Status - [x] Methodology + node selection (`docs/METHODOLOGY.md`) - [x] Qwen3-VL Image Judge node — `describe` (first pass) + `compare` (scoring), swap-eval, file report - [x] Agent-driven architecture (`docs/AGENT_LOOP.md`) — Receptor node + `agent_bridge.py` (`--mode`) - [x] Example workflows: `workflow_describe_api.json` (first pass) + `workflow_api.json` (compare loop) - [x] Agent calibration policy (`docs/CALIBRATION_POLICY.md`) - [ ] Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)