Files

T

Ethanfel 959ec70065 Redesign judge output for calibration: per-axis {score, ref, gen}, drop local fix suggestions

The local VLM now only observes and scores; correction is left to the stronger
external agent. Each axis reports the target value (ref), the current value (gen)
and the closeness (score) — the target/current/distance an agent needs to
calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/
camera/render) so the action cluster stays discriminative for explicit content.
swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first;
default max_new_tokens 1024. Docs aligned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-26 22:52:40 +02:00

4.4 KiB

Raw Blame History

Agent-driven calibration loop

The controller is an external CLI agent, not an in-graph node. ComfyUI is the execution environment (prompt receptor → T2I → VLM judge); the agent is the brain that reads the analysis, calibrates the prompt generator, and queues the next iteration.

 CLI AGENT (controller / brain)                 COMFYUI (execution, running with --listen)
 ───────────────────────────────                ──────────────────────────────────────────
  1. build/calibrate a prompt
  2. agent_bridge.py --prompt ... ───POST /prompt──►  CalibratorPromptReceptor (injection point)
                                                          │ prompt / negative / seed
                                                          ▼
                                                       T2I (SDXL / Flux / Krea2)
                                                          │ generated image
                                                          ▼
                                                       Qwen3-VL Image Judge
                                                          │ writes calib_<tag>.json + latest.json
  3. poll /history/{id} (bridge does this)  ◄───────────┘
  4. read report JSON (overall_score,
     per-axis score + ref/gen values)
  5. steer prompt toward ref on worst axes
  └──► go to 1 until overall_score ≥ target

Why API-driven, not file-watch

A passive "watch a file and auto-run" receptor is fragile in ComfyUI (no native file watcher / auto-queue, and prompt↔image↔analysis can desync). Driving POST /prompt instead makes every iteration synchronous and ordered — one prompt_id ties the prompt, the image, and the analysis together. The receptor node is still the clean injection point; the agent just overrides its widgets per queue. (The receptor also supports a source_file for file-first workflows if you ever want it.)

The three pieces

Piece	Role
`CalibratorPromptReceptor` (`SxCP External Prompt (Receptor)`)	Stable node the agent injects `prompt/negative/seed` into. Feeds the sampler.
`QwenVLImageJudge` (`Qwen3-VL Image Judge (Calibrator)`)	Scores generated vs reference; writes `calib_<run_tag>.json`, `latest.json`, `calib_<run_tag>.md` to `report_dir`.
`agent_bridge.py`	One CLI call = one iteration: inject prompt → queue → wait → print the analysis JSON to stdout. Stdlib only.

One iteration (what the agent runs)

python agent_bridge.py \
  --server 127.0.0.1:8188 \
  --workflow workflow_api.json \
  --prompt   "1 woman, red lingerie, bedroom, full body, warm rim light" \
  --negative "blurry, deformed" \
  --seed 12345 \
  --run-tag  iter003 \
  --analysis-dir /media/p5/Comfyui/output/calibrator

Stdout (captured by the agent) is the report:

{
  "run_tag": "iter003",
  "overall_score": 0.62,
  "axes": {
    "position":       {"score": 0.40, "ref": "doggy style", "gen": "missionary"},
    "clothing_state": {"score": 0.85, "ref": "red lace lingerie", "gen": "plain bra"}
  },
  "prompt_used": "...",
  "_prompt_id": "…", "_report_path": "…/calib_iter003.json"
}

Agent calibration policy (suggested)

For the lowest-scoring axes, the agent rewrites that axis's prompt wording to match its ref value (the target), regenerates, and keeps changes that raise overall_score (greedy per-axis hill-climb). The local model supplies no fixes — the agent owns the correction. Keep the T2I seed fixed while searching so the score reflects the prompt, not sampler noise; vary the seed only once near target. Stop at overall_score ≥ target (e.g. 0.85) or a max-iteration budget. Full policy: CALIBRATION_POLICY.md.

Setup checklist

Run ComfyUI with --listen (so the bridge can POST). Install this node pack.
Build a workflow with: CalibratorPromptReceptor → (Prompt-Builder formatting, optional) → T2I → QwenVLImageJudge (feed the reference image into reference_image, the T2I output into generated_image).
Set the Judge's report_dir to a known path; pass the same path as --analysis-dir.
Export the workflow in API format (workflow_api.json).
Drive it from the agent with agent_bridge.py, once per iteration.

4.4 KiB Raw Blame History