Initial commit: VLM-as-judge prompt calibration loop

Qwen3-VL image-similarity judge node, external-prompt receptor node, agent_bridge CLI, example SDXL workflow, and methodology/agent-loop/ calibration-policy docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:15:56 +02:00
commit 95198a15b5
13 changed files with 1294 additions and 0 deletions
@@ -0,0 +1,110 @@
+# ComfyUI-Prompt-Calibratror
+
+A **fully local** prompt calibration loop for ComfyUI. A vision-language model
+(Qwen3-VL) judges how close a *generated* image is to a *reference* image and
+returns a structured score + per-axis difference analysis, which is used to
+**calibrate the prompt-generation method** ([ComfyUI-Prompt-Builder](../ComfyUI-Prompt-Builder))
+until the generated image matches the reference.
+
+> Full design rationale, controller options, and VLM-as-judge variance mitigations
+> are in **[docs/METHODOLOGY.md](docs/METHODOLOGY.md)**. The controller is an **external
+> CLI agent** that drives ComfyUI via its HTTP API — see **[docs/AGENT_LOOP.md](docs/AGENT_LOOP.md)**.
+
+## Nodes & tools
+
+| Component | What it is |
+|---|---|
+| `Qwen3-VL Image Judge (Calibrator)` | scores generated vs reference, writes analysis to disk for the agent |
+| `SxCP External Prompt (Receptor)` | stable injection point; the agent sets `prompt/negative/seed` here per queue |
+| `agent_bridge.py` | one CLI call = one iteration (inject → `POST /prompt` → wait → print analysis JSON) |
+
+## The "vllm node": `Qwen3-VL Image Judge (Calibrator)`
+
+The core node (`nodes/qwen_judge.py`). It reuses the standard transformers Qwen3-VL
+inference plumbing (same approach as
+[ComfyUI-QwenVL-MultiImage](https://github.com/hardik-uppal/ComfyUI-QwenVL-MultiImage)
+— the recommended reuse base) but **forces strict JSON output** so an automated loop
+can act on it.
+
+**Inputs**
+
+| name | type | default | notes |
+|---|---|---|---|
+| `reference_image` | IMAGE | — | the target |
+| `generated_image` | IMAGE | — | the candidate to score |
+| `model_path` | STRING | `/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16` | local dir, **HF repo id** (`org/name`), or alias (`30b-a3b` / `8b` / `4b`) |
+| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | `nf4` = 4-bit (run the 30B judge on 32 GB); `fp8` with the `hf_fp8` copy |
+| `axes` | STRING | cast, clothing, pose, scene, composition, expression, color_light | scored axes (match your Prompt-Builder knobs) |
+| `max_new_tokens` | INT | 512 | |
+| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
+| `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias |
+| `keep_loaded` | BOOL | true | cache weights across loop iterations |
+| `auto_download` | BOOL | true | if `model_path` is a repo id/alias and not local, fetch it from HF into `models/prompt_generator/` |
+
+**Auto-download:** set `model_path` to `30b-a3b` (alias) or any `org/name` repo id and leave
+`auto_download` on — the node snapshot-downloads it on first run (into ComfyUI's
+`models/prompt_generator/<name>`) and reuses the local copy afterward. Local paths and the
+default skip download entirely.
+
+**Outputs**
+
+| name | type | use |
+|---|---|---|
+| `overall_score` | FLOAT 0..1 | loop stop-condition / objective |
+| `axis_scores_json` | STRING (JSON) | per-axis `{score, diff}` for the controller |
+| `diff_analysis` | STRING | human/controller-readable summary + fix suggestions |
+| `raw` | STRING | raw model output (both passes if `swap_eval`) |
+
+## Install
+
+```bash
+cd /media/p5/Comfyui/custom_nodes
+ln -s /media/p5/ComfyUI-Prompt-Calibratror .     # or git clone
+/media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt
+```
+
+The node defaults to the **huihui-ai Qwen3-VL-4B-Instruct abliterated** weights already
+converted at `/media/p5/qwen3vl_4b_abliterated_comfy_convert/` so it runs out of the box
+(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
+otherwise break the loop).
+
+**Recommended upgrade (latest Qwen VL + uncensored, fits 32 GB):**
+[`huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated`](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated)
+— MoE (3B active, fast), run at `precision=nf4` (~18 GB). The node auto-detects the MoE
+class. An easier middle ground is the **8B** abliterated at `bf16` (~17 GB, no quantization).
+Qwen3.5-VL abliterated isn't out yet (Qwen3.5 abliterated builds are text-only so far);
+Gemma-3-27B-it abliterated (4-bit) is a viable non-Qwen alternative. See
+[docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl).
+
+## Loop sketch
+
+```
+Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge
+        ▲                                                                │
+        └──────── knob overrides ◀── Controller ◀── overall_score + diff ┘
+```
+
+Use the Prompt-Builder **For-Loop Start/End + Accumulator** nodes to drive iterations and
+route `overall_score` into the stop condition. Controller options (greedy hill-climb →
+black-box optimizer → LLM-in-the-loop) are in the methodology doc.
+
+## End-to-end loop
+
+1. Run ComfyUI with `--listen`, install this node pack, put your reference at `ComfyUI/input/reference.png`.
+2. Load `workflow/workflow_api.json` (SDXL `waiIllustriousSDXL_v160` example — swap the checkpoint for Flux/Krea as needed).
+3. Drive it from your agent following `docs/CALIBRATION_POLICY.md`:
+   ```bash
+   python agent_bridge.py --workflow workflow/workflow_api.json \
+     --prompt "1 woman, red lingerie, bedroom, full body, warm light" \
+     --run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator
+   ```
+   stdout = the analysis JSON → agent calibrates → next iteration.
+
+## Status
+
+- [x] Methodology + node selection (`docs/METHODOLOGY.md`)
+- [x] Qwen3-VL Image Judge node (structured JSON scoring, swap-eval, model caching, file report)
+- [x] Agent-driven architecture (`docs/AGENT_LOOP.md`) — Receptor node + `agent_bridge.py`
+- [x] Example end-to-end workflow (`workflow/workflow_api.json`)
+- [x] Agent calibration policy (`docs/CALIBRATION_POLICY.md`)
+- [ ] Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)