describe emits one canonical reference; compare can anchor on it

Describe mode now produces a single coherent, internally-consistent canonical
scene description (paragraph + per-axis spec, written to canonical_reference in
the report). Compare gains an optional reference_description input: when set, it
anchors on that fixed text and shows only the generated image (no swap) — so the
reference side never drifts or self-contradicts across iterations; only the
generated image is re-described each turn. agent_bridge gains --ref-desc /
--ref-desc-file (reads the describe report's canonical_reference). Docs + example
workflow updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-26 23:22:57 +02:00
parent 53f1f9b9b4
commit 69c1d6deb4
6 changed files with 149 additions and 51 deletions
+5 -3
View File
@@ -37,8 +37,8 @@ supports a `source_file` for file-first workflows if you ever want it.)
| Piece | Role |
|---|---|
| `CalibratorPromptReceptor` (`SxCP External Prompt (Receptor)`) | Stable node the agent injects `prompt/negative/seed` into. Feeds the sampler. |
| `QwenVLImageJudge` (`Qwen3-VL Image Judge (Calibrator)`) | Scores generated vs reference; writes `calib_<run_tag>.json`, `latest.json`, `calib_<run_tag>.md` to `report_dir`. |
| `agent_bridge.py` | One CLI call = one iteration: inject prompt → queue → wait → print the analysis JSON to stdout. Stdlib only. |
| `QwenVLImageJudge` (`Qwen3-VL Image Judge (Calibrator)`) | `describe` (first pass) emits the canonical reference; `compare` judges generated vs reference per axis (verdict match/partial/mismatch). When given `reference_description`, compare anchors on that fixed text. Writes `calib_<run_tag>.json` + `latest.json` to `report_dir`. |
| `agent_bridge.py` | One CLI call = one iteration: inject prompt (+`--ref-desc-file` for the canonical anchor) → queue → wait → print the analysis JSON to stdout. Stdlib only. |
## One iteration (what the agent runs)
@@ -86,7 +86,9 @@ not sampler noise; vary the seed only once near target. Stop at `overall_score
`caption` it returns is the seed prompt; the `axes` are the seed axis_state.
3. **Compare loop:** build a workflow with `CalibratorPromptReceptor` → (Prompt-Builder formatting,
optional) → T2I → `QwenVLImageJudge` (mode `compare`; feed the **reference** into
`reference_image`, the T2I output into `generated_image`).
`reference_image`, the T2I output into `generated_image`). Pass `--ref-desc-file
<report_dir>/calib_seed.json` so compare anchors on the canonical reference from step 2
(the `ref` side stays fixed across iterations; only the generated image is re-described).
4. Set the Judge's `report_dir` to a known path; pass the same path as `--analysis-dir`.
5. Export each workflow in **API format**.
6. Drive it from the agent with `agent_bridge.py`, once per iteration (describe once, then compare in a loop).
+10 -6
View File
@@ -47,18 +47,21 @@ pose cluster is split into many axes so the agent gets specific, actionable targ
## Step 0 — first pass (describe / bootstrap)
The very first iteration has no generated image yet, so the judge runs in **describe
mode**: it looks at the reference alone and returns a prompt-ready `caption` plus a
per-axis target spec. That seeds everything:
mode**: it looks at the reference alone and emits **one canonical scene description**
a coherent, internally-consistent paragraph plus a per-axis target spec. That seeds
everything *and* becomes the fixed reference for the whole loop:
```bash
python agent_bridge.py --mode describe --workflow workflow/workflow_describe_api.json \
--run-tag seed --analysis-dir <report_dir>
```
`latest.json` = `{"mode":"describe", "caption":"...", "axes":{axis: "value", ...}}`
`calib_seed.json` = `{"mode":"describe", "description":"", "axes":{axis:value,…}, "canonical_reference":"…"}`
The agent takes `caption` as the **initial prompt** and `axes` as the **initial
axis_state**, then enters the compare loop below. No reference description has to be
written by hand — the VL provides the target to reproduce.
The agent takes `description` as the **initial prompt** and `axes` as the **initial
axis_state**. Crucially, the compare loop then **anchors on this canonical reference**
(via `--ref-desc-file`) instead of re-reading the reference image every iteration — so the
`ref` side never drifts or contradicts itself across passes; only the generated image is
re-described each turn.
## Per-iteration algorithm (greedy per-axis hill-climb)
@@ -69,6 +72,7 @@ loop:
prompt = render(state) # state = current value per axis
report = run agent_bridge.py --prompt prompt --negative state.negative
--seed state.seed --run-tag iter{i}
--ref-desc-file <report_dir>/calib_seed.json # anchor on canonical ref
--workflow wf.json --analysis-dir <report_dir>
if report.mismatch_count == 0 and report.overall_score >= TARGET:
stop("converged", state) # TARGET e.g. 0.9 (mostly match)