53f1f9b9b4
The 4B's 0-1 scores were unreliable (identical ref/gen scored ~0.6), so the judge now returns verdict match/partial/mismatch per axis; overall_score and a new mismatch_count are computed from verdicts on our side (reliable, monotonic). Expanded the action/pose cluster into position_name, body_orientation, limb_arrangement, penetration, contact_points, genital_visibility (+ breast_size) so explicit poses carry detail. Each axis now ships a one-line definition in the prompt so gender_mix/subject_count stop absorbing positional text. 24 axes total. Example workflows use the node default (axes=''). Docs realigned; stop condition is now mismatch_count==0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
93 lines
4.7 KiB
Markdown
93 lines
4.7 KiB
Markdown
# Agent-driven calibration loop
|
|
|
|
The controller is an **external CLI agent**, not an in-graph node. ComfyUI is the
|
|
execution environment (prompt receptor → T2I → VLM judge); the agent is the brain that
|
|
reads the analysis, calibrates the prompt generator, and queues the next iteration.
|
|
|
|
```
|
|
CLI AGENT (controller / brain) COMFYUI (execution, running with --listen)
|
|
─────────────────────────────── ──────────────────────────────────────────
|
|
1. build/calibrate a prompt
|
|
2. agent_bridge.py --prompt ... ───POST /prompt──► CalibratorPromptReceptor (injection point)
|
|
│ prompt / negative / seed
|
|
▼
|
|
T2I (SDXL / Flux / Krea2)
|
|
│ generated image
|
|
▼
|
|
Qwen3-VL Image Judge
|
|
│ writes calib_<tag>.json + latest.json
|
|
3. poll /history/{id} (bridge does this) ◄───────────┘
|
|
4. read report JSON (overall_score,
|
|
per-axis score + ref/gen values)
|
|
5. steer prompt toward ref on worst axes
|
|
└──► go to 1 until overall_score ≥ target
|
|
```
|
|
|
|
## Why API-driven, not file-watch
|
|
|
|
A passive "watch a file and auto-run" receptor is fragile in ComfyUI (no native file
|
|
watcher / auto-queue, and prompt↔image↔analysis can desync). Driving `POST /prompt`
|
|
instead makes every iteration **synchronous and ordered** — one `prompt_id` ties the
|
|
prompt, the image, and the analysis together. The receptor node is still the clean
|
|
injection point; the agent just overrides its widgets per queue. (The receptor *also*
|
|
supports a `source_file` for file-first workflows if you ever want it.)
|
|
|
|
## The three pieces
|
|
|
|
| Piece | Role |
|
|
|---|---|
|
|
| `CalibratorPromptReceptor` (`SxCP External Prompt (Receptor)`) | Stable node the agent injects `prompt/negative/seed` into. Feeds the sampler. |
|
|
| `QwenVLImageJudge` (`Qwen3-VL Image Judge (Calibrator)`) | Scores generated vs reference; writes `calib_<run_tag>.json`, `latest.json`, `calib_<run_tag>.md` to `report_dir`. |
|
|
| `agent_bridge.py` | One CLI call = one iteration: inject prompt → queue → wait → print the analysis JSON to stdout. Stdlib only. |
|
|
|
|
## One iteration (what the agent runs)
|
|
|
|
```bash
|
|
python agent_bridge.py \
|
|
--server 127.0.0.1:8188 \
|
|
--workflow workflow_api.json \
|
|
--prompt "1 woman, red lingerie, bedroom, full body, warm rim light" \
|
|
--negative "blurry, deformed" \
|
|
--seed 12345 \
|
|
--run-tag iter003 \
|
|
--analysis-dir /media/p5/Comfyui/output/calibrator
|
|
```
|
|
|
|
Stdout (captured by the agent) is the report:
|
|
|
|
```json
|
|
{
|
|
"run_tag": "iter003",
|
|
"overall_score": 0.62,
|
|
"mismatch_count": 1,
|
|
"axes": {
|
|
"position_name": {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
|
|
"clothing_state": {"verdict": "partial", "ref": "red lace lingerie", "gen": "plain bra"}
|
|
},
|
|
"prompt_used": "...",
|
|
"_prompt_id": "…", "_report_path": "…/calib_iter003.json"
|
|
}
|
|
```
|
|
|
|
## Agent calibration policy (suggested)
|
|
|
|
For the lowest-scoring axes, the agent rewrites that axis's prompt wording to match its
|
|
`ref` value (the target), regenerates, and keeps changes that raise `overall_score`
|
|
(greedy per-axis hill-climb). The local model supplies no fixes — the agent owns the
|
|
correction. Keep the **T2I seed fixed** while searching so the score reflects the prompt,
|
|
not sampler noise; vary the seed only once near target. Stop at `overall_score ≥ target`
|
|
(e.g. 0.85) or a max-iteration budget. Full policy: **[CALIBRATION_POLICY.md](CALIBRATION_POLICY.md)**.
|
|
|
|
## Setup checklist
|
|
|
|
1. Run ComfyUI with `--listen` (so the bridge can POST). Install this node pack.
|
|
2. **First pass:** run the describe workflow (`LoadImage` → `QwenVLImageJudge` with `mode=describe`,
|
|
no T2I) once: `agent_bridge.py --mode describe --workflow workflow_describe_api.json`. The
|
|
`caption` it returns is the seed prompt; the `axes` are the seed axis_state.
|
|
3. **Compare loop:** build a workflow with `CalibratorPromptReceptor` → (Prompt-Builder formatting,
|
|
optional) → T2I → `QwenVLImageJudge` (mode `compare`; feed the **reference** into
|
|
`reference_image`, the T2I output into `generated_image`).
|
|
4. Set the Judge's `report_dir` to a known path; pass the same path as `--analysis-dir`.
|
|
5. Export each workflow in **API format**.
|
|
6. Drive it from the agent with `agent_bridge.py`, once per iteration (describe once, then compare in a loop).
|