Drop named-position axis for grounded geometry (30B still mis-names positions)

Even the 30B mis-identifies named sex positions (doggy/cowgirl) from images, so position_name is removed. The pose cluster is now purely observable geometry: body_orientation enriched with facing direction (who faces whom), plus limb_arrangement / contact_points / pose. The agent composes any named label from these reliable primitives. 23 default axes. Docs/examples updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:49:23 +02:00
parent e4dfaac63b
commit 06992506d7
4 changed files with 21 additions and 19 deletions
@@ -61,7 +61,7 @@ Stdout (captured by the agent) is the report:
  "overall_score": 0.62,
  "mismatch_count": 1,
  "axes": {
-    "position_name":  {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
+    "body_orientation": {"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
    "clothing_state": {"verdict": "partial",  "ref": "red lace lingerie", "gen": "plain bra"}
  },
  "prompt_used": "...",
@@ -29,10 +29,11 @@ fine-grained — named positions, limb arrangement, gaze, hair detail — **use
 (identical `ref`==`gen` → `match`), but it cannot fix a wrong *description*; only a more
 capable model can.

-**Prefer grounded geometry over named labels.** A named position (`position_name`) forces
-the model to classify into a vocabulary it gets wrong; observable geometry
-(`body_orientation`, `limb_arrangement`, `contact_points`, who faces where) is more
-grounded and survives a weaker model better. Weight those axes over the named label.
+**Grounded geometry, not named labels.** Naming a position (`doggy`/`cowgirl`) is
+unreliable *even at 30B* — the named-label axis was removed. The pose cluster is now purely
+observable geometry (`body_orientation` incl. who faces where, `limb_arrangement`,
+`contact_points`, `pose`); compose a named position yourself from those primitives if you
+need one. Geometry survives the model far better than the abstraction.

 The axes must **span what the prompt can express** — you can only fix what the prompt can
 say, and each diff must map to a lever. The default set (configurable on the node) is
@@ -43,10 +44,10 @@ grouped below.
 - **Identity / cast:** `subject_count`, `gender_mix`, `age_appearance`, `ethnicity_skin`
 - **Body:** `body_type`, `breast_size`, `distinctive_features` (tattoos/piercings/marks), `hair`
 - **Wardrobe:** `clothing_state` (degree of undress + garments)
- **Action / pose (where explicit content concentrates — kept granular):** `sexual_act`,
-  `position_name` (doggy/cowgirl/…), `body_orientation` (on top/from behind/…),
+- **Action / pose (granular, observable geometry — no named labels):** `sexual_act`,
+  `body_orientation` (who on top/bottom/side + which way each faces),
  `limb_arrangement` (legs spread/raised, hands), `penetration` (type/depth/angle),
-  `contact_points`, `genital_visibility`, `pose` (torso/head lean)
+  `contact_points`, `genital_visibility`, `pose` (torso/head lean, arch)
 - **Affect:** `facial_expression`, `gaze`
 - **Camera:** `framing` (shot/crop), `camera_angle` (POV/angle)
 - **Render:** `scene`, `lighting_color`, `art_style`
@@ -120,8 +121,8 @@ phrase to "doggy style"). No machine-supplied fix list — the agent owns this s
 ```
 iter1  overall=0.55  mism=6   worst: scene MISMATCH  ref:[dim bedroom] gen:[bright kitchen]
       edit scene → "dimly lit bedroom"
-iter2  overall=0.63  mism=5   worst: position_name MISMATCH ref:[doggy style] gen:[cowgirl]
-       edit position → "doggy style, from behind"
+iter2  overall=0.63  mism=5   worst: body_orientation MISMATCH ref:[female on top, facing partner] gen:[female on bottom]
+       edit → "woman straddling on top, facing him"
 iter3  overall=0.71  mism=3   worst: lighting_color MISMATCH ref:[warm low-key] gen:[flat daylight]
       edit lighting → "warm low-key lighting"   (mism=4 → revert)
 iter4  retry lighting → "warm golden low-key glow"   (mism=2 → keep, overall=0.82)
@@ -138,7 +139,7 @@ iter6  overall=0.93  mism=0   ≥ target → STOP
  "overall_score": 0.63,
  "mismatch_count": 5,
  "axes": {
-    "position_name": {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
+    "body_orientation": {"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
    "scene":         {"verdict": "match",    "ref": "dim bedroom", "gen": "dim bedroom"}
  },
  "prompt_used": "...", "_prompt_id": "...", "_report_path": "..."
@@ -148,7 +149,7 @@ iter6  overall=0.93  mism=0   ≥ target → STOP
 ## Agent system prompt (paste into your CLI agent)

 > You are the controller for a local image prompt calibrator. Goal: make a generated
-> image match a reference, measured by a Qwen3-VL judge that compares ~24 axes (identity,
+> image match a reference, measured by a Qwen3-VL judge that compares ~23 axes (identity,
 > body, wardrobe, action/pose, affect, camera, render) and for each returns a `verdict`
 > (match / partial / mismatch), `ref` (what the reference shows) and `gen` (what the
 > generated shows). `overall_score` and `mismatch_count` are computed from the verdicts.
@@ -120,7 +120,7 @@ observes; it suggests no fixes (a stronger external model owns correction).
 {
  "axes": {
    "subject_count":  {"verdict": "match",    "ref": "1 woman", "gen": "1 woman"},
-    "position_name":  {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
+    "body_orientation":{"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
    "clothing_state": {"verdict": "mismatch", "ref": "red lace lingerie", "gen": "nude"},
    "scene":          {"verdict": "partial",  "ref": "dim bedroom", "gen": "lit bedroom"},
    "lighting_color": {"verdict": "match",    "ref": "warm low-key", "gen": "warm low-key"}
@@ -132,9 +132,9 @@ A **discrete verdict** (match/partial/mismatch) is used instead of a 0–1 score
 give unreliable fine scores (identical ref/gen often scored ~0.6) but classify the three
 buckets reliably. `overall_score` + `mismatch_count` are computed from the verdicts on our
 side (mean ordinal), so they're trustworthy as a stop signal. The axis list is
-**configurable**; the default ~24 axes are grouped identity / body / wardrobe / action·pose
+**configurable**; the default ~23 axes are grouped identity / body / wardrobe / action·pose
 / affect / camera / render, with the action·pose cluster split fine (`sexual_act`,
-`position_name`, `body_orientation`, `limb_arrangement`, `penetration`, `contact_points`,
+`body_orientation`, `limb_arrangement`, `penetration`, `contact_points`,
 `genital_visibility`) so it stays discriminative for explicit content. Each axis carries a
 one-line definition in the prompt. The agent steers each `mismatch`/`partial` axis toward
 its `ref`. See [CALIBRATION_POLICY.md](CALIBRATION_POLICY.md).