ComfyUI-Prompt-Calibrator/docs/AGENT_LOOP.md

# Agent-driven calibration loop

The controller is an **external CLI agent**, not an in-graph node. ComfyUI is the
execution environment (prompt receptor → T2I → VLM judge); the agent is the brain that
reads the analysis, calibrates the prompt generator, and queues the next iteration.

```
 CLI AGENT (controller / brain)                 COMFYUI (execution, running with --listen)
 ───────────────────────────────                ──────────────────────────────────────────
  1. build/calibrate a prompt
  2. agent_bridge.py --prompt ... ───POST /prompt──►  CalibratorPromptReceptor (injection point)
                                                          │ prompt / negative / seed
                                                          ▼
                                                       T2I (SDXL / Flux / Krea2)
                                                          │ generated image
                                                          ▼
                                                       Qwen3-VL Image Judge
                                                          │ writes calib_<tag>.json + latest.json
  3. poll /history/{id} (bridge does this)  ◄───────────┘
  4. read report JSON (overall_score,
     per-axis score + ref/gen values)
  5. steer prompt toward ref on worst axes
  └──► go to 1 until overall_score ≥ target
```

## Why API-driven, not file-watch

A passive "watch a file and auto-run" receptor is fragile in ComfyUI (no native file
watcher / auto-queue, and prompt↔image↔analysis can desync). Driving `POST /prompt`
instead makes every iteration **synchronous and ordered** — one `prompt_id` ties the
prompt, the image, and the analysis together. The receptor node is still the clean
injection point; the agent just overrides its widgets per queue. (The receptor *also*
supports a `source_file` for file-first workflows if you ever want it.)

## The three pieces

| Piece | Role |
|---|---|
| `CalibratorPromptReceptor` (`SxCP External Prompt (Receptor)`) | Stable node the agent injects `prompt/negative/seed` into. Feeds the sampler. |
| `QwenVLImageJudge` (`Qwen3-VL Image Judge (Calibrator)`) | `describe` (first pass) emits the canonical reference; `compare` judges generated vs reference per axis (verdict match/partial/mismatch). When given `reference_description`, compare anchors on that fixed text. Writes `calib_<run_tag>.json` + `latest.json` to `report_dir`. |
| `agent_bridge.py` | One CLI call = one iteration: inject prompt (+`--ref-desc-file` for the canonical anchor) → queue → wait → print the analysis JSON to stdout. Stdlib only. |

## One iteration (what the agent runs)

```bash
python agent_bridge.py \
  --server 127.0.0.1:8188 \
  --workflow workflow_api.json \
  --prompt   "1 woman, red lingerie, bedroom, full body, warm rim light" \
  --negative "blurry, deformed" \
  --seed 12345 \
  --run-tag  iter003 \
  --analysis-dir /media/p5/Comfyui/output/calibrator
```

Stdout (captured by the agent) is the report:

```json
{
  "run_tag": "iter003",
  "overall_score": 0.62,
  "mismatch_count": 1,
  "axes": {
    "body_orientation": {"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
    "clothing_state": {"verdict": "partial",  "ref": "red lace lingerie", "gen": "plain bra"}
  },
  "_prompt_id": "…", "_report_path": "…/calib_iter003.json"
}
```

## Agent calibration policy (suggested)

For the lowest-scoring axes, the agent rewrites that axis's prompt wording to match its
`ref` value (the target), regenerates, and keeps changes that raise `overall_score`
(greedy per-axis hill-climb). The local model supplies no fixes — the agent owns the
correction. Keep the **T2I seed fixed** while searching so the score reflects the prompt,
not sampler noise; vary the seed only once near target. Stop at `overall_score ≥ target`
(e.g. 0.85) or a max-iteration budget. Full policy: **[CALIBRATION_POLICY.md](CALIBRATION_POLICY.md)**.

## Setup checklist

1. Run ComfyUI with `--listen` (so the bridge can POST). Install this node pack.
2. **First pass:** run the describe workflow (`LoadImage` → `QwenVLImageJudge` with `mode=describe`,
   no T2I) once: `agent_bridge.py --mode describe --workflow workflow_describe_api.json`. The
   `caption` it returns is the seed prompt; the `axes` are the seed axis_state.
3. **Compare loop:** build a workflow with `CalibratorPromptReceptor` → (Prompt-Builder formatting,
   optional) → T2I → `QwenVLImageJudge` (mode `compare`; feed the **reference** into
   `reference_image`, the T2I output into `generated_image`). Pass `--ref-desc-file
   <report_dir>/calib_seed.json` so compare anchors on the canonical reference from step 2
   (the `ref` side stays fixed across iterations; only the generated image is re-described).
4. Set the Judge's `report_dir` to a known path; pass the same path as `--analysis-dir`.
5. Export each workflow in **API format**.
6. Drive it from the agent with `agent_bridge.py`, once per iteration (describe once, then compare in a loop).