69c1d6deb4
Describe mode now produces a single coherent, internally-consistent canonical scene description (paragraph + per-axis spec, written to canonical_reference in the report). Compare gains an optional reference_description input: when set, it anchors on that fixed text and shows only the generated image (no swap) — so the reference side never drifts or self-contradicts across iterations; only the generated image is re-described each turn. agent_bridge gains --ref-desc / --ref-desc-file (reads the describe report's canonical_reference). Docs + example workflow updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
123 lines
7.0 KiB
Markdown
123 lines
7.0 KiB
Markdown
# ComfyUI-Prompt-Calibratror
|
|
|
|
A **fully local** prompt calibration loop for ComfyUI. A vision-language model
|
|
(Qwen3-VL) judges how close a *generated* image is to a *reference* image and
|
|
returns a structured score + per-axis difference analysis, which is used to
|
|
**calibrate the prompt-generation method** ([ComfyUI-Prompt-Builder](../ComfyUI-Prompt-Builder))
|
|
until the generated image matches the reference.
|
|
|
|
> Full design rationale, controller options, and VLM-as-judge variance mitigations
|
|
> are in **[docs/METHODOLOGY.md](docs/METHODOLOGY.md)**. The controller is an **external
|
|
> CLI agent** that drives ComfyUI via its HTTP API — see **[docs/AGENT_LOOP.md](docs/AGENT_LOOP.md)**.
|
|
|
|
## Nodes & tools
|
|
|
|
| Component | What it is |
|
|
|---|---|
|
|
| `Qwen3-VL Image Judge (Calibrator)` | scores generated vs reference, writes analysis to disk for the agent |
|
|
| `SxCP External Prompt (Receptor)` | stable injection point; the agent sets `prompt/negative/seed` here per queue |
|
|
| `agent_bridge.py` | one CLI call = one iteration (inject → `POST /prompt` → wait → print analysis JSON) |
|
|
|
|
## The "vllm node": `Qwen3-VL Image Judge (Calibrator)`
|
|
|
|
The core node (`nodes/qwen_judge.py`). It reuses the standard transformers Qwen3-VL
|
|
inference plumbing (same approach as
|
|
[ComfyUI-QwenVL-MultiImage](https://github.com/hardik-uppal/ComfyUI-QwenVL-MultiImage)
|
|
— the recommended reuse base) but **forces strict JSON output** so an automated loop
|
|
can act on it.
|
|
|
|
**Inputs**
|
|
|
|
| name | type | default | notes |
|
|
|---|---|---|---|
|
|
| `reference_image` | IMAGE | — | the target |
|
|
| `mode` | compare / describe | compare | `describe` = first pass over the reference only → caption + target spec (seeds the prompt). `compare` = score ref vs generated |
|
|
| `generated_image` | IMAGE (optional) | — | the candidate to score (required for `compare`, ignored for `describe`) |
|
|
| `model_path` | STRING | `/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16` | local dir, **HF repo id** (`org/name`), or alias (`30b-a3b` / `8b` / `4b`) |
|
|
| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | `nf4` = 4-bit (run the 30B judge on 32 GB); `fp8` with the `hf_fp8` copy |
|
|
| `axes` | STRING | ~20 axes (identity, body, wardrobe, action, affect, camera, render) | scored/described axes; granular for explicit content. Edit to taste |
|
|
| `max_new_tokens` | INT | 1024 | |
|
|
| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
|
|
| `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias |
|
|
| `keep_loaded` | BOOL | true | cache weights across loop iterations |
|
|
| `auto_download` | BOOL | true | if `model_path` is a repo id/alias and not local, fetch it from HF into `models/prompt_generator/` |
|
|
|
|
**Auto-download:** set `model_path` to `30b-a3b` (alias) or any `org/name` repo id and leave
|
|
`auto_download` on — the node snapshot-downloads it on first run (into ComfyUI's
|
|
`models/prompt_generator/<name>`) and reuses the local copy afterward. Local paths and the
|
|
default skip download entirely.
|
|
|
|
**Outputs**
|
|
|
|
| name | type | use |
|
|
|---|---|---|
|
|
| `overall_score` | FLOAT 0..1 | compare: mean verdict (computed here, not by the model). describe: `1.0` placeholder |
|
|
| `axis_scores_json` | STRING (JSON) | compare: per-axis `{verdict, ref, gen}` (verdict = match/partial/mismatch). describe: `{axis: value}` |
|
|
| `analysis` | STRING | compare: header (`overall, N mismatches`) + axes worst-first (`VERDICT ref:[…] gen:[…]`). describe: the `caption` |
|
|
| `raw` | STRING | raw model output (both passes if `swap_eval`) |
|
|
| `report_path` | STRING | path to the written `calib_<tag>.json` (carries `mismatch_count`) |
|
|
|
|
## Install
|
|
|
|
```bash
|
|
cd /media/p5/Comfyui/custom_nodes
|
|
ln -s /media/p5/ComfyUI-Prompt-Calibratror . # or git clone
|
|
/media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt
|
|
```
|
|
|
|
The node defaults to the **huihui-ai Qwen3-VL-4B-Instruct abliterated** weights already
|
|
converted at `/media/p5/qwen3vl_4b_abliterated_comfy_convert/` so it runs out of the box
|
|
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
|
|
otherwise break the loop).
|
|
|
|
**Recommended upgrade (latest Qwen VL + uncensored, fits 32 GB):**
|
|
[`huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated`](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated)
|
|
— MoE (3B active, fast), run at `precision=nf4` (~18 GB). The node auto-detects the MoE
|
|
class. An easier middle ground is the **8B** abliterated at `bf16` (~17 GB, no quantization).
|
|
Qwen3.5-VL abliterated isn't out yet (Qwen3.5 abliterated builds are text-only so far);
|
|
Gemma-3-27B-it abliterated (4-bit) is a viable non-Qwen alternative. See
|
|
[docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl).
|
|
|
|
## Loop sketch
|
|
|
|
```
|
|
Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge
|
|
▲ │
|
|
└──────── knob overrides ◀── Controller ◀── overall_score + diff ┘
|
|
```
|
|
|
|
Use the Prompt-Builder **For-Loop Start/End + Accumulator** nodes to drive iterations and
|
|
route `overall_score` into the stop condition. Controller options (greedy hill-climb →
|
|
black-box optimizer → LLM-in-the-loop) are in the methodology doc.
|
|
|
|
## End-to-end loop
|
|
|
|
1. Run ComfyUI with `--listen`, install this node pack, put your reference at `ComfyUI/input/reference.png`.
|
|
2. **First pass (describe):** the judge looks at the reference alone and emits **one canonical
|
|
scene description** (coherent paragraph + per-axis target spec) to seed the prompt *and*
|
|
anchor the loop:
|
|
```bash
|
|
python agent_bridge.py --mode describe --workflow workflow/workflow_describe_api.json \
|
|
--run-tag seed --analysis-dir /media/p5/Comfyui/output/calibrator
|
|
```
|
|
3. **Compare loop:** load `workflow/workflow_api.json` (SDXL `waiIllustriousSDXL_v160` example —
|
|
swap the checkpoint for Flux/Krea as needed) and iterate, following `docs/CALIBRATION_POLICY.md`.
|
|
Pass `--ref-desc-file` so compare anchors on the canonical reference (the `ref` side stays
|
|
fixed; only the generated image is re-read each turn):
|
|
```bash
|
|
python agent_bridge.py --workflow workflow/workflow_api.json \
|
|
--prompt "<description from step 2, then calibrated>" \
|
|
--ref-desc-file /media/p5/Comfyui/output/calibrator/calib_seed.json \
|
|
--run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator
|
|
```
|
|
stdout = the analysis JSON (`{verdict, ref, gen}` per axis) → agent steers toward `ref` → next iteration.
|
|
|
|
## Status
|
|
|
|
- [x] Methodology + node selection (`docs/METHODOLOGY.md`)
|
|
- [x] Qwen3-VL Image Judge node — `describe` (first pass) + `compare` (scoring), swap-eval, file report
|
|
- [x] Agent-driven architecture (`docs/AGENT_LOOP.md`) — Receptor node + `agent_bridge.py` (`--mode`)
|
|
- [x] Example workflows: `workflow_describe_api.json` (first pass) + `workflow_api.json` (compare loop)
|
|
- [x] Agent calibration policy (`docs/CALIBRATION_POLICY.md`)
|
|
- [ ] Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)
|