ComfyUI-Ethanfel-Prompt-Bui…/docs/sxcp-eval-loop.md

# SxCP Eval Loop

This loop is for tuning the SxCP generator toward stronger Krea2 images.
ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
result, then sends back exactly one edited prompt for the next A/B test.
Confirmed findings become either generator changes or durable prompt rules in
[`krea2-prompt-guide.md`](krea2-prompt-guide.md).
The active A/B testing method is recorded in
[`krea2-ab-methodology.md`](krea2-ab-methodology.md); update that memory when
the method improves.

## Channels

- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and
  seed.
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through
  the MCP signal when supported. Do not put analysis here.
- `sxcp_eval_log`: optional analysis/log channel.

## MCP Helper Command

Use the checked helper for bridge calls instead of ad hoc Python snippets. The
approved command prefix is:

```bash
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py
```

Common calls:

```bash
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py list-tools
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_pull --arguments-json '{"channel":"sxcp_eval_in"}'
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_push --arguments-json '{"channel":"sxcp_eval_out","seed":5656565656,"text":"PROMPT_ONLY_POSITIVE_CONDITIONING"}'
```

## Batch Prompt Helper

For prompt-axis batches, prepare a local JSON file and use the offline helper to
render the approved MCP push/pull commands and an image-presence checklist:

```bash
python tools/sxcp_prompt_batch.py validate --batch-json /tmp/sxcp-batch.json
python tools/sxcp_prompt_batch.py print-push-commands --batch-json /tmp/sxcp-batch.json
python tools/sxcp_prompt_batch.py print-result-template --batch-json /tmp/sxcp-batch.json
python tools/sxcp_prompt_batch.py run-batch --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --previous-turn 80 --run
python tools/sxcp_prompt_batch.py validate-results --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json
python tools/sxcp_prompt_batch.py print-eval-entry-draft --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --variant-key pov_example_variant --baseline-image /absolute/baseline.png --candidate-id controlled_subject_first
```

Batch files use the fixed sampler seed and one positive prompt per probe:

```json
{
  "seed": 8989898989,
  "channel_out": "sxcp_eval_out",
  "channel_in": "sxcp_eval_in",
  "probes": [
    {
      "id": "controlled_subject_first",
      "prompt_order": "subject_first",
      "text": "SUBJECT_LOOK_FIRST. POSE_HIERARCHY. LOCATION_ANCHORS."
    },
    {
      "id": "rough_geometry_axis",
      "prompt_order": "geometry_only",
      "text": "POSE_AXIS_ONLY_FOR_DISCOVERY."
    }
  ]
}
```

`geometry_only` probes are for rough pose-axis discovery and are not durable
subject/look-controlled A/B evidence. The helper rejects
`sxcp_eval_negative_out`; keep batch prompts positive-only.

Use `run-batch --run` to push one positive prompt, poll `sxcp_eval_in` until a
new turn and absolute PNG image path appear with the fixed sampler seed, write
the filled result JSON, then send the next probe. Omit `--run` for a dry-run
command preview. After a live run, run `validate-results`; it requires the
result probe ids to match the batch order, each turn to advance in batch order,
every image path to be an absolute PNG artifact, and every returned seed to
match the fixed sampler seed. Then use `print-eval-entry-draft` to create a
valid `krea2-eval-log.json` entry draft. Replace the generated summaries and
observation with the real visual comparison before recording it with
`tools/krea2_record_eval.py`. By default the draft command rejects
`geometry_only` candidates; pass `--allow-geometry-only` only when deliberately
recording non-controlled prompt-axis evidence.

## Manual Loop

Start the helper after sending a test prompt:

```bash
tools/sxcp_eval_loop.sh 3
```

Every three minutes it prints a structured request asking Codex to:

1. Pull `sxcp_eval_in`.
2. Record the emitted seed.
3. Inspect the image.
4. Compare it to the prompt and previous edit.
5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
   the MCP signal when available.
6. Classify the finding as prompt-only, prompt-guide rule, provisional generator
   improvement, or proven generator fix.
7. When leaving a category after same-seed progress over baseline, mirror the
   best generator-safe wording into the responsible generator path as
   `provisional_generator_patch`.
8. Promote a generator change to proven only when the issue is systemic,
   repeated, or structurally wrong before rendering.
9. Record the finding and update the Krea2 prompt guide when a rule is confirmed.

Runtime logs are written under `.sxcp_eval/` and ignored by git.

Durable fixed-seed findings that justify a guide rule, generator patch, or pose
variant promotion are recorded in [`krea2-eval-log.json`](krea2-eval-log.json).
Method changes belong in [`krea2-ab-methodology.md`](krea2-ab-methodology.md).
Use runtime logs for scratch notes; use the JSON log only for evidence that
should remain tied to a catalog variant. Image paths in that log point at
external ComfyUI artifacts and may be cleaned; the durable evidence is the fixed
sampler seed, optional generator seed, prompt summaries, observation, decision,
and commit.

Record durable findings with the checked helper instead of hand-editing the log:

```bash
python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 --generator-seed 5678 > /tmp/krea2-entry.json
python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json --dry-run
python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json
```

Entry template:

```json
{
  "id": "variant-seed-short-finding",
  "date": "2026-06-29",
  "variant_key": "pov_example_variant",
  "seed": 1234,
  "generator_seed": 5678,
  "source": "sxcp_eval_mcp",
  "result": "accepted",
  "decision": "generator_patch",
  "baseline_prompt_summary": "What the generated prompt did before the edit.",
  "candidate_prompt_summary": "What the edited prompt changed for the same seed.",
  "observation": "What the image comparison proved and why it matters for the generator or guide.",
  "baseline_image": "/absolute/path/to/baseline.png",
  "candidate_image": "/absolute/path/to/candidate.png",
  "commit": "pending"
}
```

To see catalog coverage and the next variants that still need controlled
testing, run:

```bash
python tools/krea2_tuning_report.py
```

The report includes atlas references plus prompt cues and avoid cues for the
next fixed-seed test candidate. It also shows the latest durable evidence for
variants that already have fixed-seed results, including the evidence id, seed,
decision, candidate prompt summary, and observation. For each normal next-test
candidate, it prints a `krea2_record_eval.py --print-template` command; replace
`<fixed_seed>` with the seed from the run you are recording.

## Optional Command Hook

If you have a one-shot Codex command you want to run automatically, set:

```bash
SXCP_EVAL_CODEX_CMD="codex exec" tools/sxcp_eval_loop.sh 3
```

The request is sent on stdin. The command also receives:

- `SXCP_EVAL_IN_CHANNEL`
- `SXCP_EVAL_OUT_CHANNEL`
- `SXCP_EVAL_LOG_CHANNEL`
- `SXCP_EVAL_GUIDE_FILE`
- `SXCP_EVAL_REQUEST_FILE`
- `SXCP_EVAL_CYCLE_DIR`
- `SXCP_EVAL_CYCLE`

## Evaluation Axes

- Identity consistency
- Outfit continuity
- Pose/action accuracy
- Camera compliance
- Location coherence
- Crop/framing
- Prompt noise/repetition
- Model confusion tokens
- Seed control/reproducibility
- Overall Krea2 image usefulness

## POV Pose Atlas

Use `/media/unraid/davinci/Qwen_edit_lora/POV/dataset_v2` as the local
reference atlas for POV pose geometry. The top-level pose folders contain real
POV examples, and matching `_control` folders contain solo/control versions.
Ignore `bg` and `*_bg` folders for pose rules; they are background plates
without people. Treat the pose image folders as the primary source for body
geometry; captions are optional and are not present for every folder.

Suggested workflow:

1. Choose one pose family, for example `doggy`, `doggy_alt`, `cowgirl`, or
   `missionary`.
2. Sample 5-10 real pose images and their control images.
3. Write the repeated geometry as a compact prompt rule.
4. Run one fixed-seed Krea2 prompt using that rule.
5. Repeat on a second seed or character before changing generator defaults.
6. If the prompt itself is structurally contradictory before rendering, patch
   immediately and add a regression test.

For POV doggy, the atlas shows that visible viewer thighs, lower torso, or
pelvis can be correct. Do not treat them as automatic failures.

## Seed Contract

The sampler seed is transport metadata, not prompt text. When the graph emits a
sampler seed, an A/B wording test should reuse that exact seed so the image
difference mostly comes from wording, not sampling randomness. If the SxCP
generator/control seed differs from the sampler seed, record it as
`generator_seed` in the eval entry. If a payload has no sampler seed, mark that
cycle as uncontrolled and avoid turning the result into a durable generator rule
without another controlled run.

## Positive-Only Conditioning

`sxcp_eval_out` is positive conditioning only. Never send negative-conditioning
phrases such as `no shaft`, `no hands`, `without clothing`, or `avoid X` inside
the positive prompt; distilled Krea2 can reinforce or hallucinate the unwanted
object from that wording.

This loop has no active negative-output channel. A same-positive, same-seed
probe on seed `424242` compared empty negative conditioning against strong
negative text targeting visible prompt attributes, and the rendered image stayed
visually unchanged. Do not rely on negative conditioning for Krea2 pose tuning;
keep prompt fixes positive-only.

## Generator Fix Rule

Use two levels of generator change:

- `provisional_generator_patch`: apply the best generator-safe wording when
  leaving a category after fixed-seed progress over baseline. Keep the catalog
  variant as `candidate`.
- `generator_patch`: promote as a proven/default generator rule when the issue
  is repeated, systemic, or structurally wrong before rendering.

Examples of proven generator fixes:

- Selfie wording overrides orbit camera.
- Clothing continuity loses the selected softcore outfit.
- POV wording makes the off-camera participant the visual subject.
- Location camera layout inserts foreground anchors in the wrong place.

For one-off model drift inside an active category, send a cleaner prompt to
`sxcp_eval_out` and keep collecting evidence. When exiting a category, carry
forward same-seed improvements over baseline as provisional generator changes
and add the rule or weak case to `docs/krea2-prompt-guide.md`.