Add describe (first-pass) mode to the judge node

New mode on QwenVLImageJudge: 'describe' looks at the reference alone and returns
a prompt-ready caption + per-axis target spec to seed the very first prompt (the
generator has nothing to reproduce yet). 'compare' is the existing ref-vs-gen
scoring. generated_image is now optional (required only for compare); shared
generation refactored into _generate_from_messages; third output renamed
diff_analysis -> analysis (mode-agnostic). agent_bridge gains --mode (describe
needs no receptor/prompt); added workflow_describe_api.json. Docs updated with the
first-pass bootstrap step. Fixed error-return arity to 5-tuple.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-26 23:04:09 +02:00
parent 959ec70065
commit c7ef756a71
6 changed files with 211 additions and 47 deletions
+9 -4
View File
@@ -80,7 +80,12 @@ not sampler noise; vary the seed only once near target. Stop at `overall_score
## Setup checklist
1. Run ComfyUI with `--listen` (so the bridge can POST). Install this node pack.
2. Build a workflow with: `CalibratorPromptReceptor` → (Prompt-Builder formatting, optional) → T2I → `QwenVLImageJudge` (feed the **reference** image into `reference_image`, the T2I output into `generated_image`).
3. Set the Judge's `report_dir` to a known path; pass the same path as `--analysis-dir`.
4. Export the workflow in **API format** (`workflow_api.json`).
5. Drive it from the agent with `agent_bridge.py`, once per iteration.
2. **First pass:** run the describe workflow (`LoadImage``QwenVLImageJudge` with `mode=describe`,
no T2I) once: `agent_bridge.py --mode describe --workflow workflow_describe_api.json`. The
`caption` it returns is the seed prompt; the `axes` are the seed axis_state.
3. **Compare loop:** build a workflow with `CalibratorPromptReceptor` → (Prompt-Builder formatting,
optional) → T2I → `QwenVLImageJudge` (mode `compare`; feed the **reference** into
`reference_image`, the T2I output into `generated_image`).
4. Set the Judge's `report_dir` to a known path; pass the same path as `--analysis-dir`.
5. Export each workflow in **API format**.
6. Drive it from the agent with `agent_bridge.py`, once per iteration (describe once, then compare in a loop).
+16
View File
@@ -38,6 +38,22 @@ grouped below.
Coarse axes blur the differences that matter for adult imagery; this set keeps the act /
interaction cluster granular so the agent gets actionable targets.
## Step 0 — first pass (describe / bootstrap)
The very first iteration has no generated image yet, so the judge runs in **describe
mode**: it looks at the reference alone and returns a prompt-ready `caption` plus a
per-axis target spec. That seeds everything:
```bash
python agent_bridge.py --mode describe --workflow workflow/workflow_describe_api.json \
--run-tag seed --analysis-dir <report_dir>
```
`latest.json` = `{"mode":"describe", "caption":"...", "axes":{axis: "value", ...}}`
The agent takes `caption` as the **initial prompt** and `axes` as the **initial
axis_state**, then enters the compare loop below. No reference description has to be
written by hand — the VL provides the target to reproduce.
## Per-iteration algorithm (greedy per-axis hill-climb)
```