Drop named-position axis for grounded geometry (30B still mis-names positions)
Even the 30B mis-identifies named sex positions (doggy/cowgirl) from images, so position_name is removed. The pose cluster is now purely observable geometry: body_orientation enriched with facing direction (who faces whom), plus limb_arrangement / contact_points / pose. The agent composes any named label from these reliable primitives. 23 default axes. Docs/examples updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+1
-1
@@ -61,7 +61,7 @@ Stdout (captured by the agent) is the report:
|
|||||||
"overall_score": 0.62,
|
"overall_score": 0.62,
|
||||||
"mismatch_count": 1,
|
"mismatch_count": 1,
|
||||||
"axes": {
|
"axes": {
|
||||||
"position_name": {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
|
"body_orientation": {"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
|
||||||
"clothing_state": {"verdict": "partial", "ref": "red lace lingerie", "gen": "plain bra"}
|
"clothing_state": {"verdict": "partial", "ref": "red lace lingerie", "gen": "plain bra"}
|
||||||
},
|
},
|
||||||
"prompt_used": "...",
|
"prompt_used": "...",
|
||||||
|
|||||||
+12
-11
@@ -29,10 +29,11 @@ fine-grained — named positions, limb arrangement, gaze, hair detail — **use
|
|||||||
(identical `ref`==`gen` → `match`), but it cannot fix a wrong *description*; only a more
|
(identical `ref`==`gen` → `match`), but it cannot fix a wrong *description*; only a more
|
||||||
capable model can.
|
capable model can.
|
||||||
|
|
||||||
**Prefer grounded geometry over named labels.** A named position (`position_name`) forces
|
**Grounded geometry, not named labels.** Naming a position (`doggy`/`cowgirl`) is
|
||||||
the model to classify into a vocabulary it gets wrong; observable geometry
|
unreliable *even at 30B* — the named-label axis was removed. The pose cluster is now purely
|
||||||
(`body_orientation`, `limb_arrangement`, `contact_points`, who faces where) is more
|
observable geometry (`body_orientation` incl. who faces where, `limb_arrangement`,
|
||||||
grounded and survives a weaker model better. Weight those axes over the named label.
|
`contact_points`, `pose`); compose a named position yourself from those primitives if you
|
||||||
|
need one. Geometry survives the model far better than the abstraction.
|
||||||
|
|
||||||
The axes must **span what the prompt can express** — you can only fix what the prompt can
|
The axes must **span what the prompt can express** — you can only fix what the prompt can
|
||||||
say, and each diff must map to a lever. The default set (configurable on the node) is
|
say, and each diff must map to a lever. The default set (configurable on the node) is
|
||||||
@@ -43,10 +44,10 @@ grouped below.
|
|||||||
- **Identity / cast:** `subject_count`, `gender_mix`, `age_appearance`, `ethnicity_skin`
|
- **Identity / cast:** `subject_count`, `gender_mix`, `age_appearance`, `ethnicity_skin`
|
||||||
- **Body:** `body_type`, `breast_size`, `distinctive_features` (tattoos/piercings/marks), `hair`
|
- **Body:** `body_type`, `breast_size`, `distinctive_features` (tattoos/piercings/marks), `hair`
|
||||||
- **Wardrobe:** `clothing_state` (degree of undress + garments)
|
- **Wardrobe:** `clothing_state` (degree of undress + garments)
|
||||||
- **Action / pose (where explicit content concentrates — kept granular):** `sexual_act`,
|
- **Action / pose (granular, observable geometry — no named labels):** `sexual_act`,
|
||||||
`position_name` (doggy/cowgirl/…), `body_orientation` (on top/from behind/…),
|
`body_orientation` (who on top/bottom/side + which way each faces),
|
||||||
`limb_arrangement` (legs spread/raised, hands), `penetration` (type/depth/angle),
|
`limb_arrangement` (legs spread/raised, hands), `penetration` (type/depth/angle),
|
||||||
`contact_points`, `genital_visibility`, `pose` (torso/head lean)
|
`contact_points`, `genital_visibility`, `pose` (torso/head lean, arch)
|
||||||
- **Affect:** `facial_expression`, `gaze`
|
- **Affect:** `facial_expression`, `gaze`
|
||||||
- **Camera:** `framing` (shot/crop), `camera_angle` (POV/angle)
|
- **Camera:** `framing` (shot/crop), `camera_angle` (POV/angle)
|
||||||
- **Render:** `scene`, `lighting_color`, `art_style`
|
- **Render:** `scene`, `lighting_color`, `art_style`
|
||||||
@@ -120,8 +121,8 @@ phrase to "doggy style"). No machine-supplied fix list — the agent owns this s
|
|||||||
```
|
```
|
||||||
iter1 overall=0.55 mism=6 worst: scene MISMATCH ref:[dim bedroom] gen:[bright kitchen]
|
iter1 overall=0.55 mism=6 worst: scene MISMATCH ref:[dim bedroom] gen:[bright kitchen]
|
||||||
edit scene → "dimly lit bedroom"
|
edit scene → "dimly lit bedroom"
|
||||||
iter2 overall=0.63 mism=5 worst: position_name MISMATCH ref:[doggy style] gen:[cowgirl]
|
iter2 overall=0.63 mism=5 worst: body_orientation MISMATCH ref:[female on top, facing partner] gen:[female on bottom]
|
||||||
edit position → "doggy style, from behind"
|
edit → "woman straddling on top, facing him"
|
||||||
iter3 overall=0.71 mism=3 worst: lighting_color MISMATCH ref:[warm low-key] gen:[flat daylight]
|
iter3 overall=0.71 mism=3 worst: lighting_color MISMATCH ref:[warm low-key] gen:[flat daylight]
|
||||||
edit lighting → "warm low-key lighting" (mism=4 → revert)
|
edit lighting → "warm low-key lighting" (mism=4 → revert)
|
||||||
iter4 retry lighting → "warm golden low-key glow" (mism=2 → keep, overall=0.82)
|
iter4 retry lighting → "warm golden low-key glow" (mism=2 → keep, overall=0.82)
|
||||||
@@ -138,7 +139,7 @@ iter6 overall=0.93 mism=0 ≥ target → STOP
|
|||||||
"overall_score": 0.63,
|
"overall_score": 0.63,
|
||||||
"mismatch_count": 5,
|
"mismatch_count": 5,
|
||||||
"axes": {
|
"axes": {
|
||||||
"position_name": {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
|
"body_orientation": {"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
|
||||||
"scene": {"verdict": "match", "ref": "dim bedroom", "gen": "dim bedroom"}
|
"scene": {"verdict": "match", "ref": "dim bedroom", "gen": "dim bedroom"}
|
||||||
},
|
},
|
||||||
"prompt_used": "...", "_prompt_id": "...", "_report_path": "..."
|
"prompt_used": "...", "_prompt_id": "...", "_report_path": "..."
|
||||||
@@ -148,7 +149,7 @@ iter6 overall=0.93 mism=0 ≥ target → STOP
|
|||||||
## Agent system prompt (paste into your CLI agent)
|
## Agent system prompt (paste into your CLI agent)
|
||||||
|
|
||||||
> You are the controller for a local image prompt calibrator. Goal: make a generated
|
> You are the controller for a local image prompt calibrator. Goal: make a generated
|
||||||
> image match a reference, measured by a Qwen3-VL judge that compares ~24 axes (identity,
|
> image match a reference, measured by a Qwen3-VL judge that compares ~23 axes (identity,
|
||||||
> body, wardrobe, action/pose, affect, camera, render) and for each returns a `verdict`
|
> body, wardrobe, action/pose, affect, camera, render) and for each returns a `verdict`
|
||||||
> (match / partial / mismatch), `ref` (what the reference shows) and `gen` (what the
|
> (match / partial / mismatch), `ref` (what the reference shows) and `gen` (what the
|
||||||
> generated shows). `overall_score` and `mismatch_count` are computed from the verdicts.
|
> generated shows). `overall_score` and `mismatch_count` are computed from the verdicts.
|
||||||
|
|||||||
+3
-3
@@ -120,7 +120,7 @@ observes; it suggests no fixes (a stronger external model owns correction).
|
|||||||
{
|
{
|
||||||
"axes": {
|
"axes": {
|
||||||
"subject_count": {"verdict": "match", "ref": "1 woman", "gen": "1 woman"},
|
"subject_count": {"verdict": "match", "ref": "1 woman", "gen": "1 woman"},
|
||||||
"position_name": {"verdict": "mismatch", "ref": "doggy style", "gen": "cowgirl"},
|
"body_orientation":{"verdict": "mismatch", "ref": "female on top, facing partner", "gen": "female on bottom"},
|
||||||
"clothing_state": {"verdict": "mismatch", "ref": "red lace lingerie", "gen": "nude"},
|
"clothing_state": {"verdict": "mismatch", "ref": "red lace lingerie", "gen": "nude"},
|
||||||
"scene": {"verdict": "partial", "ref": "dim bedroom", "gen": "lit bedroom"},
|
"scene": {"verdict": "partial", "ref": "dim bedroom", "gen": "lit bedroom"},
|
||||||
"lighting_color": {"verdict": "match", "ref": "warm low-key", "gen": "warm low-key"}
|
"lighting_color": {"verdict": "match", "ref": "warm low-key", "gen": "warm low-key"}
|
||||||
@@ -132,9 +132,9 @@ A **discrete verdict** (match/partial/mismatch) is used instead of a 0–1 score
|
|||||||
give unreliable fine scores (identical ref/gen often scored ~0.6) but classify the three
|
give unreliable fine scores (identical ref/gen often scored ~0.6) but classify the three
|
||||||
buckets reliably. `overall_score` + `mismatch_count` are computed from the verdicts on our
|
buckets reliably. `overall_score` + `mismatch_count` are computed from the verdicts on our
|
||||||
side (mean ordinal), so they're trustworthy as a stop signal. The axis list is
|
side (mean ordinal), so they're trustworthy as a stop signal. The axis list is
|
||||||
**configurable**; the default ~24 axes are grouped identity / body / wardrobe / action·pose
|
**configurable**; the default ~23 axes are grouped identity / body / wardrobe / action·pose
|
||||||
/ affect / camera / render, with the action·pose cluster split fine (`sexual_act`,
|
/ affect / camera / render, with the action·pose cluster split fine (`sexual_act`,
|
||||||
`position_name`, `body_orientation`, `limb_arrangement`, `penetration`, `contact_points`,
|
`body_orientation`, `limb_arrangement`, `penetration`, `contact_points`,
|
||||||
`genital_visibility`) so it stays discriminative for explicit content. Each axis carries a
|
`genital_visibility`) so it stays discriminative for explicit content. Each axis carries a
|
||||||
one-line definition in the prompt. The agent steers each `mismatch`/`partial` axis toward
|
one-line definition in the prompt. The agent steers each `mismatch`/`partial` axis toward
|
||||||
its `ref`. See [CALIBRATION_POLICY.md](CALIBRATION_POLICY.md).
|
its `ref`. See [CALIBRATION_POLICY.md](CALIBRATION_POLICY.md).
|
||||||
|
|||||||
+5
-4
@@ -58,15 +58,16 @@ AXIS_DEFS = {
|
|||||||
"hair": "hair length, color, texture, and style",
|
"hair": "hair length, color, texture, and style",
|
||||||
# wardrobe
|
# wardrobe
|
||||||
"clothing_state": "degree of undress and any garments / lingerie / accessories",
|
"clothing_state": "degree of undress and any garments / lingerie / accessories",
|
||||||
# action & pose cluster (the crux for explicit content — be specific)
|
# action & pose cluster — OBSERVABLE GEOMETRY, not named labels. Naming a position
|
||||||
|
# ("doggy"/"cowgirl") is unreliable even at 30B; describe what is visible and let the
|
||||||
|
# agent compose any label from these primitives.
|
||||||
"sexual_act": "type of activity: vaginal, anal, oral/blowjob, handjob, fingering, none...",
|
"sexual_act": "type of activity: vaginal, anal, oral/blowjob, handjob, fingering, none...",
|
||||||
"position_name": "the named sex position if identifiable (doggy, missionary, cowgirl/reverse, spooning, 69...)",
|
"body_orientation": "who is on top / bottom / side / kneeling / standing, and which way each body faces (facing partner, same direction, or away). Describe the geometry; do NOT guess a named position.",
|
||||||
"body_orientation": "how bodies are oriented: who is on top/bottom/side, facing each other or from behind",
|
|
||||||
"limb_arrangement": "placement of legs and arms (spread, bent, raised, over shoulder, kneeling) and hand placement",
|
"limb_arrangement": "placement of legs and arms (spread, bent, raised, over shoulder, kneeling) and hand placement",
|
||||||
"penetration": "penetration type, depth (shallow/full), angle, and how visible it is",
|
"penetration": "penetration type, depth (shallow/full), angle, and how visible it is",
|
||||||
"contact_points": "where bodies touch: grip/hands location, mouth, points of contact",
|
"contact_points": "where bodies touch: grip/hands location, mouth, points of contact",
|
||||||
"genital_visibility": "which genitals are visible and how explicitly the frame shows them",
|
"genital_visibility": "which genitals are visible and how explicitly the frame shows them",
|
||||||
"pose": "overall body posture not covered above (torso/head lean, arch, twist)",
|
"pose": "overall body posture: torso/head lean, arch, twist, hip angle",
|
||||||
# affect
|
# affect
|
||||||
"facial_expression": "facial expression / affect (eyes, mouth, brow)",
|
"facial_expression": "facial expression / affect (eyes, mouth, brow)",
|
||||||
"gaze": "gaze direction / eye contact (at camera, partner, away, eyes closed)",
|
"gaze": "gaze direction / eye contact (at camera, partner, away, eyes closed)",
|
||||||
|
|||||||
Reference in New Issue
Block a user