Add Krea2 POV routing and eval tooling

2026-06-30 19:28:10 +02:00
parent 284c6279e6
commit f5ba07e340
29 changed files with 6331 additions and 400 deletions
@@ -0,0 +1,461 @@
+# Krea2 A/B Methodology Memory
+
+This file is the persistent memory for SxCP Krea2 prompt A/B methodology.
+Update it whenever the testing method improves.
+
+## Current Method
+
+Version: `2026-06-30-generated-route-validation-positive-channel-cleanup`
+
+1. Pull or construct the baseline from an actual SxCP/CodexMCPTest source case.
+2. Keep the sampler seed fixed across the baseline and candidate.
+3. Keep subject, location family, camera family, and target pose fixed unless
+   the experiment explicitly tests one of those axes.
+4. Change one prompt variable at a time when possible, usually the visual
+   hierarchy for the target contact or pose.
+5. Keep `sxcp_eval_out` positive-only. Do not place negative-conditioning
+   phrases in the visible prompt.
+6. Use location-compatible anchors only. For coworking/office scenes, use chair
+   edge, desk edge, laptop table, glass partitions, repeated desk rows, plants,
+   and window depth instead of bedroom or bedding anchors.
+7. Treat a manual prompt win as proof that Krea2 responds to the wording, not
+   proof that the SxCP generator already emits it.
+8. Mirror a prompt win into the generator as a provisional improvement when
+   leaving a category if same-seed evidence shows it improves over baseline and
+   the wording is generator-safe. Keep the route `candidate` until the broader
+   generator-patch evidence matrix proves it.
+9. When a subject-first batch preserves appearance but repeatedly misses the
+   atlas body plane, record it as weak-case evidence and consider stronger
+   control before adding more generator text.
+10. Score spatial orientation against the atlas before accepting evidence,
+    and treat a contradictory room/background read as a rejection even when
+    contact or limb placement is clear. Use background cues to decide whether
+    the viewer or partner is high, low, standing, seated, supine, or on a
+    support before grading pose/contact quality.
+11. For hard text-only pose families, set an exploration budget before calling
+    the route weak or deciding it needs stronger control. Eight prompt probes
+    are only an early signal. Use batched wording-axis probes and aim for about
+    fifty positive-only tries across meaningful axes before concluding that
+    prompt text cannot reliably express the pose.
+12. Do not require a perfect atlas hit before carrying progress forward. After
+    the exploration budget, a repeatable partial that beats the baseline failure
+    mode can become an accepted provisional generator improvement while the
+    remaining miss stays documented for later seed/source expansion.
+13. After patching generator wording, render one prompt produced by the actual
+    code path before closing the category. Manual prompt-axis wins are not
+    enough; the generated route can still drop the key contact hierarchy or add
+    limiting positive-channel wording.
+
+## Promotion Gates
+
+- One clean fixed-seed A/B can be recorded as evidence for that source case.
+- A prompt-guide rule needs repeated evidence across distinct subjects,
+  locations, or seeds, unless the generated prompt is structurally wrong before
+  rendering.
+- A catalog variant remains candidate until the rule repeats under controlled
+  conditions.
+- A provisional generator patch is allowed when leaving a category if the best
+  tested wording improves over baseline on a fixed seed. It should preserve the
+  selected subject, outfit, location, and camera semantics, and it must not patch
+  in a scene workaround that only solved one render.
+- A proven/default generator patch still needs the broader evidence matrix below,
+  unless the generated prompt is structurally wrong before rendering.
+
+## Generator Mirroring
+
+After a manual A/B prompt win, do not assume the SxCP generator mirrors the
+wording. Add a failing regression against the final formatter output first, then
+patch the narrow route boundary that owns the wording. The regression should
+assert the accepted hierarchy terms and reject the failure mode that caused the
+bad render, such as scene-incompatible anchors or negative-conditioning text in
+the positive prompt.
+
+After the route patch, run a generated-route probe through `sxcp_eval_out` with
+the same sampler seed when feasible. Use the actual formatter output, not a
+hand-normalized prompt. If the generated route regresses compared with the
+manual prompt-axis winner, record the failed generated-route image as the
+baseline, tighten the route wording, and validate again before logging the
+candidate as generated-route evidence.
+
+For location-specific wins, split the implementation:
+
+- the action or role graph owns the pose/contact hierarchy;
+- the final Krea formatter owns scene-compatible anchor expansion because it can
+  see the selected scene, camera, and composition;
+- existing route phrases that downstream tests rely on should be preserved
+inside the stronger wording when they do not conflict with the A/B evidence.
+
+## MCP Command Memory
+
+Use the checked helper instead of ad hoc Python snippets for bridge calls. The
+approved command prefix is:
+
+```bash
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py
+```
+
+Common calls:
+
+```bash
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py list-tools
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_pull --arguments-json '{"channel":"sxcp_eval_in"}'
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_push --arguments-json '{"channel":"sxcp_eval_out","seed":5656565656,"text":"PROMPT_ONLY_POSITIVE_CONDITIONING"}'
+```
+
+For batched prompt-axis search, prepare a JSON batch and use the offline command
+renderer before touching the bridge manually:
+
+```bash
+python tools/sxcp_prompt_batch.py validate --batch-json /tmp/sxcp-batch.json
+python tools/sxcp_prompt_batch.py print-push-commands --batch-json /tmp/sxcp-batch.json
+python tools/sxcp_prompt_batch.py print-result-template --batch-json /tmp/sxcp-batch.json
+python tools/sxcp_prompt_batch.py run-batch --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --previous-turn 80 --run
+python tools/sxcp_prompt_batch.py validate-results --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json
+python tools/sxcp_prompt_batch.py print-eval-entry-draft --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --variant-key pov_example_variant --baseline-image /absolute/baseline.png --candidate-id controlled_subject_first
+```
+
+Use `run-batch --run` for normal batch execution. It pushes one positive prompt,
+polls `sxcp_eval_in` until the turn advances and an absolute PNG appears with
+the fixed sampler seed, writes the filled result JSON, then sends the next
+prompt. Omit `--run` for a dry-run command preview. Run `validate-results` after
+the batch and before drafting evidence. It checks that every probe returned a
+new ordered turn, an absolute PNG image path, and the same sampler seed as the
+batch. This keeps batched prompt search as image-presence collection first and
+bulk analysis second.
+
+Before drafting evidence, compare atlas references and generated images for
+spatial orientation, not only limb/contact similarity. First decide the
+atlas's surface and camera-height relationship, then check whether the
+generated background supports the same read. Use the background as a
+camera-height witness: ceiling, upper walls, and high partition lines usually
+support a low viewer looking upward; floor, carpet, table tops, platform edges,
+or furniture behind the body can reveal a higher camera, seated support, or a
+different surface. If the atlas target has the viewer flat on his back or the
+partner mounted over him, do not accept a candidate only because contact is
+clear; the room geometry must also support that flat/low read. Reject the
+candidate before generator mirroring when the background says the bodies are on
+a different surface or at a different height than the atlas.
+
+`print-eval-entry-draft` rejects `geometry_only` candidates by default. Use
+`--allow-geometry-only` only when the entry is explicitly labeled as
+non-controlled prompt-axis evidence rather than subject/look-controlled A/B
+evidence.
+
+Keep `sxcp_eval_out` prompt-only and positive-only. Do not use
+`sxcp_eval_negative_out` for Krea2 tuning.
+
+## Generator-Patch Evidence Matrix
+
+Do prompt and image exploration before editing production generator wording. A
+normal pose-wording generator patch needs all of this evidence first:
+
+- at least three distinct source cases with different visible subjects;
+- at least two sampler seeds, unless the source prompt is structurally wrong
+  before rendering;
+- location-family coverage when the proposed wording changes scene anchors;
+- one baseline and one candidate per source case, with subject, location family,
+  camera family, and sampler seed fixed inside each pair;
+- positive-only candidate prompts, with no negative-conditioning phrases in the
+  positive prompt.
+
+A generated-route probe that works before the full matrix is useful evidence.
+If it is the best tested improvement when leaving the category, it can become a
+`provisional_generator_patch` with final prompt regression coverage. It should
+not become a proven `generator_patch` decision until the matrix repeats and the
+final generated prompt is regression-tested.
+
+## Hard-Pose Exploration Budget
+
+Use this budget for atlas poses where early prompt-only results repeatedly miss
+the core spatial read.
+
+- Define the failure threshold before the run. The default threshold is about
+  fifty positive-only prompt tries across distinct wording axes before declaring
+  the pose text-insufficient or moving it to a stronger-control bucket.
+- Run the search in batches, usually six to twelve prompts at a time. Send each
+  prompt through `sxcp_eval_out`, wait for the image path, then analyze the
+  batch together instead of overreacting to one render.
+- Keep a short axis ledger for each batch: intended wording axis, seed, source
+  subject, best image, repeated failure mode, and words that literalized or
+  harmed the result.
+- Treat a small failed batch as direction, not a conclusion. If a batch shows a
+  repeated failure such as head height, camera height, viewer/partner elevation,
+  or background-plane mismatch, the next batch should vary that axis directly.
+- Stop early only for a strong positive result that is worth repeating on a
+  second source or seed, or for a hard technical blocker. A weak but improving
+  result should feed the next wording batch rather than ending the category.
+- If the threshold run finds a repeatable partial that is materially better
+  than baseline, accept the partial target explicitly and mirror only that
+  generator-safe improvement. Keep the route candidate and mark the evidence as
+  needing expansion when the full atlas target is still unsolved.
+
+## Current Fingering Test Pattern
+
+The prior bedding-based fingering prompt is invalid as a general rule because
+it solved a lower-foreground artifact by adding bedroom context to an office
+scene. The corrected test pattern keeps the coworking location intact:
+
+- baseline: generic POV fingering/manual-contact wording from the same source
+  case;
+- candidate: foreground hand first, open-thigh geometry second, visible woman
+  face/torso third, office chair and coworking depth fourth;
+- anchors: black office chair seat/arms, desk edge, laptop table corners, glass
+  partitions, repeated desk rows, plants, tall-window depth;
+- rejection trigger: any result that fixes contact by changing the scene family
+  instead of improving the pose hierarchy.
+
+## Improvement Log
+
+- `2026-06-30`: Added side-camera/result-label separation after ballsucking
+  seed `5757575757` produced attractive low side-camera oral views while still
+  collapsing the requested contact object onto the shaft/glans. Future scoring
+  should record that as side-view oral evidence and keep target-contact evidence
+  separate.
+- `2026-06-30`: Added generated-route validation discipline after footjob turn
+  `183` kept large foreground soles but hid the shaft/contact that manual probes
+  had preserved. Future provisional generator patches should render the exact
+  final Krea prompt once after the code change; if shared route wording adds
+  limiting positive-channel language, clean it before sending the validation
+  prompt.
+- `2026-06-30`: Added a hard-pose exploration budget after ballsucking wording
+  tests produced only eight early probes before the first weak-case note. Future
+  hard text-only poses should use batched wording-axis search and aim for about
+  fifty positive-only tries before concluding the pose needs stronger control.
+- `2026-06-30`: Added partial-acceptance discipline after ballsucking produced
+  repeatable tongue/lips-on-testicles results that beat the shaft/glans
+  baseline but did not fully solve mouth-wrapped contact. Future hard-pose exits
+  should preserve repeatable progress as a provisional generator patch while
+  keeping the remaining miss in the expansion queue.
+- `2026-06-30`: Added ballsucking target-object refinement after sampler seed
+  `9797979797` repeated the `scrotal skin is the nearest mouth surface` branch
+  on turns `288` and `293`. Score target-object ownership separately from the
+  side-low camera family: a route can preserve face/thigh geometry while still
+  drifting to shaft/base contact. Avoid promoting balls-first center-object
+  wording when it creates multi-subject or body-layout artifacts.
+- `2026-06-30`: Added ballsucking generated-route validation after sampler seed
+  `9898989898` repeated the patched scrotal-skin route on turns `296` and
+  `297`. Validation can accept a provisional target-object improvement while
+  still keeping the pose queued when the remaining miss is full mouth-wrapped
+  testicle contact.
+- `2026-06-30`: Added ballsucking fresh weak-case evidence after sampler seed
+  `5959595959` tested lip-oval, sideways mouth pocket, and chin-pelvis upward
+  seal wording across three women. The batch preserved low-pelvis/cheek-thigh
+  geometry in places, but every branch returned to shaft/glans collapse or
+  generic oral contact. Do not retry those axes as generator defaults; the next
+  search should change the target-object control strategy rather than adding
+  more mouth-shape synonyms.
+- `2026-06-30`: Added ballsucking occlusion weak-case evidence after sampler
+  seed `6060606060` tested foreground occlusion, under-scrotum tongue shelf,
+  and hand-guided scrotum wording across three women. The generated route
+  remained the best partial while those axes became shaft-centered or
+  hand/shaft-dominant. Do not retry occlusion or hand-support synonyms as
+  generator defaults; the next useful move is a different target-object strategy
+  or stronger control.
+- `2026-06-30`: Added ballsucking mouth-axis mixed-case evidence after sampler
+  seed `6161616161` tested exact mouth-sucking, single-testicle, hanging balls
+  below shaft, side-mouth wrap, and chin-pelvis lower-mouth wording across
+  three women. The generated-route controls stayed the best repeated partials
+  on two subjects, side-mouth and chin-pelvis variants produced isolated useful
+  partials, and the rest drifted back to shaft/glans contact. Record isolated
+  partials as axis hints, but do not patch generator wording unless a branch
+  repeats across subjects or beats the generated-route controls.
+- `2026-06-30`: Added ballsucking pelvis-valley weak-case evidence after
+  sampler seed `7171717171` tested flat pelvis-valley, thigh tunnel,
+  pubic-hair mouth-line, low-cushion chin-anchor, and pelvis-edge target-first
+  wording across three women. The flat pelvis-valley branch repeated a strong
+  body-plane correction on three subjects, matching the atlas viewer-flat
+  thigh-wall read better, but it stayed shaft-centered. Score body-plane
+  orientation and target-object contact separately; do not patch a route when
+  it improves orientation while regressing the target.
+- `2026-06-30`: Stopped the ballsucking text-only loop after sampler seed
+  `7272727272` combined `flat-valley scrotal-skin` target wording with the
+  prior side-low route across three women. The hybrid repeated the body-plane
+  hint on turns `368`, `374`, and `380`, but the target stayed shaft-centered,
+  while side-low flat-valley variants only gave look hints. Preserve the
+  current side-low scrotal-skin partial, do not patch the hybrid axes, and move
+  future full-target work toward stronger pose/control evidence rather than
+  more positive-prompt synonyms.
+- `2026-06-30`: Promoted blowjob side-profile POV after sampler seed
+  `5858585858` produced a three-woman generated-route repeat on turns `298`,
+  `301`, and `304`. When the current generated route repeats across multiple
+  subjects on a fresh seed and alternate branches do not beat it cleanly, mark
+  the route proven instead of continuing to queue it. Keep attractive
+  side-camera-style self-body crop results as a separate look branch when they
+  risk drifting toward external side framing.
+- `2026-06-29`: Added the multisource/generator-safe method after an overfit
+  single-character coworking test produced a visually usable but invalid
+  bedding foreground. Future A/B runs must test at least two source cases before
+  promoting wording that is meant to become a durable guide or generator rule.
+- `2026-06-29`: Added generator mirroring discipline after the accepted
+  fingering wording proved Krea2 behavior but not generator output. Future
+  mirroring changes need a red-green regression at final Krea formatter output,
+  not just a guide entry.
+- `2026-06-29`: Tightened generator-patch promotion after the fingering
+  generated-route probe looked good but had too little image coverage. Future
+  pose-wording generator edits need a broader seed, subject, and location matrix
+  before production route code changes.
+- `2026-06-29`: Added semantic-axis discipline after source 52 fingering tests.
+  If a candidate succeeds by changing ownership, viewpoint, location family, or
+  role semantics, record it as a weak-case or prompt note unless that semantic
+  change is the intended generator behavior. Do not count it as direct evidence
+  for the original route even when the image is visually cleaner.
+- `2026-06-29`: Added provisional generator-patch discipline after the user
+  clarified that leaving a category should still carry forward same-seed progress
+  over baseline. Future category exits should patch the generator with the best
+  generator-safe improvement, record it as `provisional_generator_patch`, and
+  keep the catalog route as `candidate` until repeated evidence proves it.
+- `2026-06-29`: Applied the category-exit rule to spread/open-thigh presentation
+  after two source subjects improved on the same sampler seed. For setup poses
+  that are not structurally broken before rendering, prefer at least two source
+  subjects before mirroring a provisional generator patch, and keep the
+  observation explicit about remaining weak points such as insufficient V-frame
+  width or outfit closure.
+- `2026-06-29`: Applied the same category-exit rule to blowjob top-view after
+  two source subjects improved on sampler seed `4242424242`. When the baseline is already usable,
+  record the improvement narrowly: name the axis that got better, keep the route
+  candidate, and avoid overstating the finding as proven until another seed
+  repeats it.
+- `2026-06-29`: Corrected blowjob top-view criteria after atlas review and a
+  same-seed source-`46` probe showed that vertical shaft alignment alone can
+  still render as frontal/eye-height oral. Future top-view evidence must show
+  steep overhead camera geometry: viewer abdomen at the lower edge, camera
+  looking down from above the viewer chest/abdomen, and the woman's hair crown,
+  shoulders, and hands visible from above.
+- `2026-06-29`: Refined blowjob top-view prompt-axis search after the user
+  rejected horizontally biased probes. Run several prompt-only probes before
+  editing the generator, wait for `sxcp_eval_in` to advance to the new turn, and
+  compare each image against the atlas verticality criteria. The useful axis is
+  `nadir-angle` or `bird's-eye` plus standing male POV, nearby floor plane
+  dominating the image, one woman directly below between the viewer's feet, and
+  top-down office anchors. Avoid `plumb-line` and `map` in generator prompts
+  because Krea2 can literalize them as drawn graphics.
+- `2026-06-29`: For quick wording-axis search, prefer a batched prompt-probe
+  loop before analysis-heavy iteration. Prepare several positive-only alternate
+  prompts that isolate likely wording axes, send them one at a time through
+  `sxcp_eval_out` with the same sampler seed, pull only until each new
+  `sxcp_eval_in` turn and image path exists, then inspect the returned images as
+  a batch. Use the bulk comparison to pick the best axis, identify literalized
+  or harmful words, and only then update the generator, guide, catalog, or eval
+  log.
+- `2026-06-29`: Preserve prompt-order controls when testing anything beyond
+  rough pose-axis discovery. Prompts that start with pose geometry and omit or
+  move the subject/look block can reduce female-look adherence, so treat those
+  runs as geometry-only probes. Durable A/B prompts should keep the original
+  subject/look description first, then the pose hierarchy, then location and
+  style/background anchors, unless the test is explicitly about prompt-order
+  sensitivity.
+- `2026-06-29`: Added result-validation discipline to the batched prompt helper.
+  After sending a batch, fill the result template from `sxcp_eval_in`, run
+  `validate-results`, and only then draft evidence. The validation step proves
+  each probe returned an ordered turn, an absolute PNG artifact, and the fixed
+  sampler seed before bulk analysis or log-entry drafting.
+- `2026-06-29`: Added `run-batch` automation to the batched prompt helper. It
+  removes manual push/pull copy-paste from normal A/B runs while keeping the same
+  gates: positive-only prompts, fixed sampler seed, turn advancement, absolute
+  PNG image path, and `validate-results` before evidence drafting.
+- `2026-06-29`: Split missionary subcases after turns `77`-`84`. Turns `76` and
+  `80` are valid angled/cushion missionary results, not failures. The flatter
+  atlas examples need a different positive axis: woman flat across an elevated
+  table/platform, viewer standing or braced at the foot edge, and viewer feet,
+  shins, or side-dropping legs placed below the support edge. Patch this only
+  into the raised-edge/edge-supported route; keep generic missionary available
+  for angled valid views.
+- `2026-06-29`: Folded-missionary tuning on seed `8989898989` used two
+  subject-first batches before code changes. Turns `85`-`88` showed that
+  compact knee-block and vertical-thigh-column wording can produce the folded
+  high-leg geometry, but the shaft/contact disappears when knees and feet lead
+  the hierarchy. Turns `89`-`92` then tested contact-first variants; turn `89`
+  was accepted because it placed the viewer lower abdomen and large centered
+  shaft/contact before the compact folded-knee block. This confirms the
+  method: use the first batch to identify the failed axis, run a targeted
+  second batch, then mirror only the accepted generator-safe hierarchy as a
+  provisional patch.
+- `2026-06-29`: Frontal cowgirl on seed `8989898989` used a baseline-plus-
+  variants batch instead of comparing against a previous category. Turn `93`
+  was a valid generic cowgirl baseline, so turn `95`'s wide horizontal thigh
+  bridge improvement became a prompt-guide rule rather than a generator patch.
+  When the baseline already hits the pose, record the useful atlas refinement
+  and leave the generator unchanged unless repeated evidence shows a systemic
+  weakness.
+- `2026-06-29`: Cowgirl-alt on seed `8989898989` exposed a spatial-orientation
+  blind spot. Turns `97`-`100` had readable contact and squat-like knees, but
+  the background still read as a platform/high-camera setup. After rechecking
+  the atlas, turns `101`-`104` tested flat-supine viewer wording with ceiling
+  and upper-room cues; turn `104` was accepted. Future pose analysis must
+  compare atlas and generated room geometry before accepting an image.
+- `2026-06-29`: Reverse cowgirl on seed `8989898989` showed that a correct
+  semantic label such as `facing away` can be ignored when the visual hierarchy
+  still resembles frontal cowgirl. Future back-facing straddle tests should
+  score facing direction before contact quality and should name the back, hips,
+  and ass as the nearest largest shapes before viewer-leg and contact details.
+  Treat over-shoulder glances as secondary refinements only after the
+  back-facing straddle is already locked.
+- `2026-06-29`: Reverse-cowgirl-alt on seed `8989898989` confirmed that atlas
+  sibling folders can need separate generator routes even when the baseline is
+  already valid. Normal reverse cowgirl is close back/hip dominant; reverse-alt
+  is upright seated with vertical back/shoulders and viewer hands or thighs
+  forming the lower frame. Keep those prompt hierarchies separate instead of
+  merging all back-facing woman-on-top evidence into one route.
+- `2026-06-29`: Added non-target-viewpoint discipline after blowjob side-profile
+  oral produced an attractive side-camera result on seed `5656565656`. If a
+  render is visually useful but reads as a different camera family, record it as
+  a weak case for a future route and do not mirror it into the current POV
+  generator path.
+- `2026-06-29`: Added MCP command memory after repeated context loss around the
+  bridge workflow. Future A/B calls should use the checked helper command
+  `/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py ...`, with
+  `comfy_push` to `sxcp_eval_out` for prompt-only positive conditioning and
+  `comfy_pull` from `sxcp_eval_in` for returned prompt/image/seed payloads.
+- `2026-06-29`: Added side-profile oral ownership discipline after source `46`
+  improved with explicit adult-male foreground ownership while source `47`
+  rejected a related `body-axis` cue by transferring the body surface to the
+  woman. Future side-profile tests should name the foreground owner repeatedly
+  and verify that the woman's body stays lateral before considering any
+  generator mirroring.
+- `2026-06-30`: Promoted the side-profile oral lateral-edge body-line axis
+  after sampler seed `9753197531` repeated it across two visible women. Pure
+  male-body-axis wording can expose the male as a photographed subject or let
+  Krea2 transfer the central body surface away from the intended first-person
+  view. Future generator patches should combine adult-male foreground ownership
+  with explicit lateral entry from the left edge, mouth at the male abdomen
+  line, and hand under the lips; keep the route provisional until another
+  seed/source expansion repeats it.
+- `2026-06-30`: Added side-profile oral generated-route contact validation
+  after turn `206` kept the male body-line geometry but let the mouth float
+  above the shaft while the hand became the contact anchor. Turn `207` improved
+  after adding lips-touching and mouth-to-shaft-contact priority. Future
+  generated-route validation for oral side-profile should score both viewpoint
+  ownership and which body part actually anchors the contact.
+- `2026-06-30`: Added the side-profile oral lower-right torso anchor after
+  sampler seed `9595959595` repeated it on turns `279` and `283` across two
+  visible women. The useful wording makes the adult male viewer's own torso
+  start at the lower edge and run diagonally into the lower-right foreground,
+  with navel, abdomen hair, pelvis, and near thigh marking the camera owner's
+  body. Prefer this over generic body-axis wording, which can expose the male
+  as a photographed side subject or transfer the axis onto the woman.
+- `2026-06-30`: Added side-profile oral generated-route validation after
+  sampler seed `9696969696` repeated the patched route on turns `284` and
+  `285`. Count generated-route validation separately from prompt-axis search:
+  it proves the formatter can carry the new wording, while promotion still
+  requires broader source/seed evidence.
+- `2026-06-30`: Promoted normal frontal cowgirl from guide-only to provisional
+  generator patch after seed `2828282828` repeated the wide-thigh bridge axis
+  across two visible women. When the baseline is already valid, a generator
+  patch is still appropriate if a later seed repeats a narrow atlas refinement
+  that improves geometry without harming subject/look, contact, or setting.
+  Generated-route turn `216` validated the patched formatter route with viewer
+  hands on outer thighs, wide foreground thigh bridge, upright torso, centered
+  contact, and coworking depth. Keep the route candidate until another
+  source/seed repeats the refinement.
+- `2026-06-29`: Applied the category-exit rule to blowjob laying frontal after
+  source `46` and source `50` improved on sampler seed `6767676767`. When
+  baselines are already strong, preserve the exact improved axis: wide V-frame and low-horizontal torso hierarchy, while noting residual high-hip posture and
+  keeping the generator patch provisional until another seed repeats it.
+- `2026-06-29`: Applied the category-exit rule to blowjob sitting upright after
+  source `46` and source `50` improved on sampler seed `7878787878`. When a
+  baseline preserves the seated pose but floats the face above the contact
+  point, prefer low-mouth seated hierarchy over generic `mouth aligned` wording:
+  face lowered to the exact center contact point, open mouth covering the
+  centered tip, and hands directly at the base. Record outfit looseness/drift as
+  residual risk and keep the generator patch provisional until another seed
+  repeats it.
@@ -5,6 +5,9 @@ ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
 result, then sends back exactly one edited prompt for the next A/B test.
 Confirmed findings become either generator changes or durable prompt rules in
 [`krea2-prompt-guide.md`](krea2-prompt-guide.md).
+The active A/B testing method is recorded in
+[`krea2-ab-methodology.md`](krea2-ab-methodology.md); update that memory when
+the method improves.

 ## Channels

@@ -14,6 +17,76 @@ Confirmed findings become either generator changes or durable prompt rules in
  the MCP signal when supported. Do not put analysis here.
 - `sxcp_eval_log`: optional analysis/log channel.

+## MCP Helper Command
+
+Use the checked helper for bridge calls instead of ad hoc Python snippets. The
+approved command prefix is:
+
+```bash
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py
+```
+
+Common calls:
+
+```bash
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py list-tools
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_pull --arguments-json '{"channel":"sxcp_eval_in"}'
+/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_push --arguments-json '{"channel":"sxcp_eval_out","seed":5656565656,"text":"PROMPT_ONLY_POSITIVE_CONDITIONING"}'
+```
+
+## Batch Prompt Helper
+
+For prompt-axis batches, prepare a local JSON file and use the offline helper to
+render the approved MCP push/pull commands and an image-presence checklist:
+
+```bash
+python tools/sxcp_prompt_batch.py validate --batch-json /tmp/sxcp-batch.json
+python tools/sxcp_prompt_batch.py print-push-commands --batch-json /tmp/sxcp-batch.json
+python tools/sxcp_prompt_batch.py print-result-template --batch-json /tmp/sxcp-batch.json
+python tools/sxcp_prompt_batch.py run-batch --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --previous-turn 80 --run
+python tools/sxcp_prompt_batch.py validate-results --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json
+python tools/sxcp_prompt_batch.py print-eval-entry-draft --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --variant-key pov_example_variant --baseline-image /absolute/baseline.png --candidate-id controlled_subject_first
+```
+
+Batch files use the fixed sampler seed and one positive prompt per probe:
+
+```json
+{
+  "seed": 8989898989,
+  "channel_out": "sxcp_eval_out",
+  "channel_in": "sxcp_eval_in",
+  "probes": [
+    {
+      "id": "controlled_subject_first",
+      "prompt_order": "subject_first",
+      "text": "SUBJECT_LOOK_FIRST. POSE_HIERARCHY. LOCATION_ANCHORS."
+    },
+    {
+      "id": "rough_geometry_axis",
+      "prompt_order": "geometry_only",
+      "text": "POSE_AXIS_ONLY_FOR_DISCOVERY."
+    }
+  ]
+}
+```
+
+`geometry_only` probes are for rough pose-axis discovery and are not durable
+subject/look-controlled A/B evidence. The helper rejects
+`sxcp_eval_negative_out`; keep batch prompts positive-only.
+
+Use `run-batch --run` to push one positive prompt, poll `sxcp_eval_in` until a
+new turn and absolute PNG image path appear with the fixed sampler seed, write
+the filled result JSON, then send the next probe. Omit `--run` for a dry-run
+command preview. After a live run, run `validate-results`; it requires the
+result probe ids to match the batch order, each turn to advance in batch order,
+every image path to be an absolute PNG artifact, and every returned seed to
+match the fixed sampler seed. Then use `print-eval-entry-draft` to create a
+valid `krea2-eval-log.json` entry draft. Replace the generated summaries and
+observation with the real visual comparison before recording it with
+`tools/krea2_record_eval.py`. By default the draft command rejects
+`geometry_only` candidates; pass `--allow-geometry-only` only when deliberately
+recording non-controlled prompt-axis evidence.
+
 ## Manual Loop

 Start the helper after sending a test prompt:
@@ -30,23 +103,30 @@ Every three minutes it prints a structured request asking Codex to:
 4. Compare it to the prompt and previous edit.
 5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
   the MCP signal when available.
-6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
-7. Change generator code/data only when the issue is systemic.
-8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
+6. Classify the finding as prompt-only, prompt-guide rule, provisional generator
+   improvement, or proven generator fix.
+7. When leaving a category after same-seed progress over baseline, mirror the
+   best generator-safe wording into the responsible generator path as
+   `provisional_generator_patch`.
+8. Promote a generator change to proven only when the issue is systemic,
+   repeated, or structurally wrong before rendering.
+9. Record the finding and update the Krea2 prompt guide when a rule is confirmed.

 Runtime logs are written under `.sxcp_eval/` and ignored by git.

 Durable fixed-seed findings that justify a guide rule, generator patch, or pose
 variant promotion are recorded in [`krea2-eval-log.json`](krea2-eval-log.json).
+Method changes belong in [`krea2-ab-methodology.md`](krea2-ab-methodology.md).
 Use runtime logs for scratch notes; use the JSON log only for evidence that
 should remain tied to a catalog variant. Image paths in that log point at
 external ComfyUI artifacts and may be cleaned; the durable evidence is the fixed
-seed, prompt summaries, observation, decision, and commit.
+sampler seed, optional generator seed, prompt summaries, observation, decision,
+and commit.

 Record durable findings with the checked helper instead of hand-editing the log:

 ```bash
-python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 > /tmp/krea2-entry.json
+python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 --generator-seed 5678 > /tmp/krea2-entry.json
 python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json --dry-run
 python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json
 ```
@@ -59,6 +139,7 @@ Entry template:
  "date": "2026-06-29",
  "variant_key": "pov_example_variant",
  "seed": 1234,
+  "generator_seed": 5678,
  "source": "sxcp_eval_mcp",
  "result": "accepted",
  "decision": "generator_patch",
@@ -141,22 +222,45 @@ pelvis can be correct. Do not treat them as automatic failures.

 ## Seed Contract

-The seed is transport metadata, not prompt text. When the graph emits a seed, an
-A/B wording test should reuse that exact seed so the image difference mostly
-comes from wording, not sampling randomness. If a payload has no seed, mark that
+The sampler seed is transport metadata, not prompt text. When the graph emits a
+sampler seed, an A/B wording test should reuse that exact seed so the image
+difference mostly comes from wording, not sampling randomness. If the SxCP
+generator/control seed differs from the sampler seed, record it as
+`generator_seed` in the eval entry. If a payload has no sampler seed, mark that
 cycle as uncontrolled and avoid turning the result into a durable generator rule
 without another controlled run.

+## Positive-Only Conditioning
+
+`sxcp_eval_out` is positive conditioning only. Never send negative-conditioning
+phrases such as `no shaft`, `no hands`, `without clothing`, or `avoid X` inside
+the positive prompt; distilled Krea2 can reinforce or hallucinate the unwanted
+object from that wording.
+
+This loop has no active negative-output channel. A same-positive, same-seed
+probe on seed `424242` compared empty negative conditioning against strong
+negative text targeting visible prompt attributes, and the rendered image stayed
+visually unchanged. Do not rely on negative conditioning for Krea2 pose tuning;
+keep prompt fixes positive-only.
+
 ## Generator Fix Rule

-Only edit the generator when the image shows a repeatable, systemic prompt
-failure. Examples:
+Use two levels of generator change:
+
+- `provisional_generator_patch`: apply the best generator-safe wording when
+  leaving a category after fixed-seed progress over baseline. Keep the catalog
+  variant as `candidate`.
+- `generator_patch`: promote as a proven/default generator rule when the issue
+  is repeated, systemic, or structurally wrong before rendering.
+
+Examples of proven generator fixes:

 - Selfie wording overrides orbit camera.
 - Clothing continuity loses the selected softcore outfit.
 - POV wording makes the off-camera participant the visual subject.
 - Location camera layout inserts foreground anchors in the wrong place.

-For one-off model drift, send a cleaner prompt to `sxcp_eval_out` and keep the
-generator unchanged. For repeated prompt behavior, update the generator and add
-the rule to `docs/krea2-prompt-guide.md`.
+For one-off model drift inside an active category, send a cleaner prompt to
+`sxcp_eval_out` and keep collecting evidence. When exiting a category, carry
+forward same-seed improvements over baseline as provisional generator changes
+and add the rule or weak case to `docs/krea2-prompt-guide.md`.