Switch compare to discrete verdicts + granular pose axes + per-axis definitions
The 4B's 0-1 scores were unreliable (identical ref/gen scored ~0.6), so the judge now returns verdict match/partial/mismatch per axis; overall_score and a new mismatch_count are computed from verdicts on our side (reliable, monotonic). Expanded the action/pose cluster into position_name, body_orientation, limb_arrangement, penetration, contact_points, genital_visibility (+ breast_size) so explicit poses carry detail. Each axis now ships a one-line definition in the prompt so gender_mix/subject_count stop absorbing positional text. 24 axes total. Example workflows use the node default (axes=''). Docs realigned; stop condition is now mismatch_count==0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+3
-2
@@ -59,9 +59,10 @@ Stdout (captured by the agent) is the report:
|
||||
{
|
||||
"run_tag": "iter003",
|
||||
"overall_score": 0.62,
|
||||
"mismatch_count": 1,
|
||||
"axes": {
|
||||
"position": {"score": 0.40, "ref": "doggy style", "gen": "missionary"},
|
||||
"clothing_state": {"score": 0.85, "ref": "red lace lingerie", "gen": "plain bra"}
|
||||
"position_name": {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
|
||||
"clothing_state": {"verdict": "partial", "ref": "red lace lingerie", "gen": "plain bra"}
|
||||
},
|
||||
"prompt_used": "...",
|
||||
"_prompt_id": "…", "_report_path": "…/calib_iter003.json"
|
||||
|
||||
Reference in New Issue
Block a user