Redesign judge output for calibration: per-axis {score, ref, gen}, drop local fix suggestions
The local VLM now only observes and scores; correction is left to the stronger external agent. Each axis reports the target value (ref), the current value (gen) and the closeness (score) — the target/current/distance an agent needs to calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/ camera/render) so the action cluster stays discriminative for explicit content. swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first; default max_new_tokens 1024. Docs aligned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+1
-1
@@ -19,7 +19,7 @@ Stdlib only — no third-party deps, so any agent can shell out to it.
|
||||
Loop, from the agent's side:
|
||||
1. build a prompt (calibrate from the previous analysis)
|
||||
2. run this script -> capture stdout (the analysis JSON)
|
||||
3. read overall_score + per-axis diffs + fix_suggestions
|
||||
3. read overall_score + per-axis {score, ref, gen}
|
||||
4. adjust the prompt and go to 1, until overall_score >= target
|
||||
"""
|
||||
|
||||
|
||||
Reference in New Issue
Block a user