Redesign judge output for calibration: per-axis {score, ref, gen}, drop local fix suggestions

The local VLM now only observes and scores; correction is left to the stronger external agent. Each axis reports the target value (ref), the current value (gen) and the closeness (score) — the target/current/distance an agent needs to calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/ camera/render) so the action cluster stays discriminative for explicit content. swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first; default max_new_tokens 1024. Docs aligned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:52:40 +02:00
parent aa3983d94a
commit 959ec70065
6 changed files with 188 additions and 164 deletions
@@ -34,7 +34,7 @@ can act on it.
 | `generated_image` | IMAGE | — | the candidate to score |
 | `model_path` | STRING | `/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16` | local dir, **HF repo id** (`org/name`), or alias (`30b-a3b` / `8b` / `4b`) |
 | `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | `nf4` = 4-bit (run the 30B judge on 32 GB); `fp8` with the `hf_fp8` copy |
-| `axes` | STRING | cast, clothing, pose, scene, composition, expression, color_light | scored axes (match your Prompt-Builder knobs) |
+| `axes` | STRING | ~20 axes (identity, body, wardrobe, action, affect, camera, render) | scored axes; granular for explicit content. Edit to taste |
 | `max_new_tokens` | INT | 512 | |
 | `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
 | `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias |
@@ -51,8 +51,8 @@ default skip download entirely.
 | name | type | use |
 |---|---|---|
 | `overall_score` | FLOAT 0..1 | loop stop-condition / objective |
-| `axis_scores_json` | STRING (JSON) | per-axis `{score, diff}` for the controller |
-| `diff_analysis` | STRING | human/controller-readable summary + fix suggestions |
+| `axis_scores_json` | STRING (JSON) | per-axis `{score, ref, gen}` — target vs current, for the agent |
+| `diff_analysis` | STRING | readable summary, worst axes first (`score  ref:[…] gen:[…]`) |
 | `raw` | STRING | raw model output (both passes if `swap_eval`) |

 ## Install