The local VLM now only observes and scores; correction is left to the stronger external agent. Each axis reports the target value (ref), the current value (gen) and the closeness (score) — the target/current/distance an agent needs to calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/ camera/render) so the action cluster stays discriminative for explicit content. swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first; default max_new_tokens 1024. Docs aligned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.4 KiB
Agent-driven calibration loop
The controller is an external CLI agent, not an in-graph node. ComfyUI is the execution environment (prompt receptor → T2I → VLM judge); the agent is the brain that reads the analysis, calibrates the prompt generator, and queues the next iteration.
CLI AGENT (controller / brain) COMFYUI (execution, running with --listen)
─────────────────────────────── ──────────────────────────────────────────
1. build/calibrate a prompt
2. agent_bridge.py --prompt ... ───POST /prompt──► CalibratorPromptReceptor (injection point)
│ prompt / negative / seed
▼
T2I (SDXL / Flux / Krea2)
│ generated image
▼
Qwen3-VL Image Judge
│ writes calib_<tag>.json + latest.json
3. poll /history/{id} (bridge does this) ◄───────────┘
4. read report JSON (overall_score,
per-axis score + ref/gen values)
5. steer prompt toward ref on worst axes
└──► go to 1 until overall_score ≥ target
Why API-driven, not file-watch
A passive "watch a file and auto-run" receptor is fragile in ComfyUI (no native file
watcher / auto-queue, and prompt↔image↔analysis can desync). Driving POST /prompt
instead makes every iteration synchronous and ordered — one prompt_id ties the
prompt, the image, and the analysis together. The receptor node is still the clean
injection point; the agent just overrides its widgets per queue. (The receptor also
supports a source_file for file-first workflows if you ever want it.)
The three pieces
| Piece | Role |
|---|---|
CalibratorPromptReceptor (SxCP External Prompt (Receptor)) |
Stable node the agent injects prompt/negative/seed into. Feeds the sampler. |
QwenVLImageJudge (Qwen3-VL Image Judge (Calibrator)) |
Scores generated vs reference; writes calib_<run_tag>.json, latest.json, calib_<run_tag>.md to report_dir. |
agent_bridge.py |
One CLI call = one iteration: inject prompt → queue → wait → print the analysis JSON to stdout. Stdlib only. |
One iteration (what the agent runs)
python agent_bridge.py \
--server 127.0.0.1:8188 \
--workflow workflow_api.json \
--prompt "1 woman, red lingerie, bedroom, full body, warm rim light" \
--negative "blurry, deformed" \
--seed 12345 \
--run-tag iter003 \
--analysis-dir /media/p5/Comfyui/output/calibrator
Stdout (captured by the agent) is the report:
{
"run_tag": "iter003",
"overall_score": 0.62,
"axes": {
"position": {"score": 0.40, "ref": "doggy style", "gen": "missionary"},
"clothing_state": {"score": 0.85, "ref": "red lace lingerie", "gen": "plain bra"}
},
"prompt_used": "...",
"_prompt_id": "…", "_report_path": "…/calib_iter003.json"
}
Agent calibration policy (suggested)
For the lowest-scoring axes, the agent rewrites that axis's prompt wording to match its
ref value (the target), regenerates, and keeps changes that raise overall_score
(greedy per-axis hill-climb). The local model supplies no fixes — the agent owns the
correction. Keep the T2I seed fixed while searching so the score reflects the prompt,
not sampler noise; vary the seed only once near target. Stop at overall_score ≥ target
(e.g. 0.85) or a max-iteration budget. Full policy: CALIBRATION_POLICY.md.
Setup checklist
- Run ComfyUI with
--listen(so the bridge can POST). Install this node pack. - Build a workflow with:
CalibratorPromptReceptor→ (Prompt-Builder formatting, optional) → T2I →QwenVLImageJudge(feed the reference image intoreference_image, the T2I output intogenerated_image). - Set the Judge's
report_dirto a known path; pass the same path as--analysis-dir. - Export the workflow in API format (
workflow_api.json). - Drive it from the agent with
agent_bridge.py, once per iteration.