Document seed-controlled Krea2 evals

2026-06-28 22:56:50 +02:00
parent 0328e5ca3a
commit ef3b983712
3 changed files with 60 additions and 26 deletions
@@ -26,6 +26,23 @@ Avoid letting two sections describe incompatible camera or framing intents.
 - Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`.
 - Keep one experiment variable per cycle when possible.
 - Lock seed, character, location, and camera when testing wording changes.
+- Treat the MCP seed as transport metadata. Preserve it for prompt-only A/B tests
+  and do not write it into the visible prompt text.
+
+## Seed-Controlled A/B Tests
+
+Use one fixed seed when deciding whether prompt wording helped Krea2. A single
+image can justify a prompt-only retry when the mismatch is obvious, but a
+generator rule needs either repeated evidence or a generated prompt that is
+structurally wrong before rendering.
+
+When reviewing an eval payload, log:
+
+- emitted seed,
+- original generated prompt,
+- edited prompt,
+- image failure or improvement,
+- whether the change should stay prompt-only or become a generator patch.

 ## Camera And Composition

@@ -1,15 +1,17 @@
 # SxCP Eval Loop

 This loop is for tuning the SxCP generator toward stronger Krea2 images.
-ComfyUI sends a generated prompt and image to Codex, Codex analyzes the result,
-then sends back exactly one edited prompt for the next A/B test. Confirmed
-findings become either generator changes or durable prompt rules in
+ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
+result, then sends back exactly one edited prompt for the next A/B test.
+Confirmed findings become either generator changes or durable prompt rules in
 [`krea2-prompt-guide.md`](krea2-prompt-guide.md).

 ## Channels

- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text and image path.
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only. Do not put analysis here.
+- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and
+  seed.
+- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through
+  the MCP signal when supported. Do not put analysis here.
 - `sxcp_eval_log`: optional analysis/log channel.

 ## Manual Loop
@@ -23,12 +25,14 @@ tools/sxcp_eval_loop.sh 3
 Every three minutes it prints a structured request asking Codex to:

 1. Pull `sxcp_eval_in`.
-2. Inspect the image.
-3. Compare it to the prompt and previous edit.
-4. Push one prompt-only edit to `sxcp_eval_out`.
-5. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
-6. Change generator code/data only when the issue is systemic.
-7. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
+2. Record the emitted seed.
+3. Inspect the image.
+4. Compare it to the prompt and previous edit.
+5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
+   the MCP signal when available.
+6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
+7. Change generator code/data only when the issue is systemic.
+8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.

 Runtime logs are written under `.sxcp_eval/` and ignored by git.

@@ -60,8 +64,17 @@ The request is sent on stdin. The command also receives:
 - Crop/framing
 - Prompt noise/repetition
 - Model confusion tokens
+- Seed control/reproducibility
 - Overall Krea2 image usefulness

+## Seed Contract
+
+The seed is transport metadata, not prompt text. When the graph emits a seed, an
+A/B wording test should reuse that exact seed so the image difference mostly
+comes from wording, not sampling randomness. If a payload has no seed, mark that
+cycle as uncontrolled and avoid turning the result into a durable generator rule
+without another controlled run.
+
 ## Generator Fix Rule

 Only edit the generator when the image shows a repeatable, systemic prompt
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
 ## Protocol

 1. Pull the latest prompt/image from \`$in_channel\`.
-2. Compare the image against the prompt and previous edited prompt.
-3. Identify concrete Krea2 mismatches and likely generator path.
-4. Classify the next step: prompt-only edit, guide rule, or generator patch.
-5. Push only the next test prompt to \`$out_channel\`.
-6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
-7. Edit generator code/data only when the issue is systemic.
-8. Update \`$guide_file\` when a wording rule is confirmed.
-9. Run focused smoke tests after generator edits.
+2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
+3. Compare the image against the prompt and previous edited prompt.
+4. Identify concrete Krea2 mismatches and likely generator path.
+5. Classify the next step: prompt-only edit, guide rule, or generator patch.
+6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
+   seed through the MCP signal when available; never write the seed into the
+   prompt text.
+7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
+8. Edit generator code/data only when the issue is systemic.
+9. Update \`$guide_file\` when a wording rule is confirmed.
+10. Run focused smoke tests after generator edits.

 ## Cycles

@@ -175,15 +178,16 @@ Channels:

 Evaluation steps:
 1. Pull the latest payload from $in_channel.
-2. Inspect image_path and compare it to the prompt text.
-3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
-4. Identify the smallest concrete mismatch that should be tested next.
-5. Classify the finding:
-   - prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
+2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
+3. Inspect image_path and compare it to the prompt text.
+4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
+5. Identify the smallest concrete mismatch that should be tested next.
+6. Classify the finding:
+   - prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
   - guide-rule: update $guide_file with the confirmed Krea2 wording rule.
   - generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
-6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
-7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
+7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
+8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.

 Current run:
 - run_id: $run_id