Document seed-controlled Krea2 evals

2026-06-28 22:56:50 +02:00
parent 0328e5ca3a
commit ef3b983712
3 changed files with 60 additions and 26 deletions
@@ -26,6 +26,23 @@ Avoid letting two sections describe incompatible camera or framing intents.
 - Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`.
 - Keep one experiment variable per cycle when possible.
 - Lock seed, character, location, and camera when testing wording changes.
 - Treat the MCP seed as transport metadata. Preserve it for prompt-only A/B tests
  and do not write it into the visible prompt text.
 ## Seed-Controlled A/B Tests
 Use one fixed seed when deciding whether prompt wording helped Krea2. A single
 image can justify a prompt-only retry when the mismatch is obvious, but a
 generator rule needs either repeated evidence or a generated prompt that is
 structurally wrong before rendering.
 When reviewing an eval payload, log:
 - emitted seed,
 - original generated prompt,
 - edited prompt,
 - image failure or improvement,
 - whether the change should stay prompt-only or become a generator patch.
 ## Camera And Composition
@@ -1,15 +1,17 @@
 # SxCP Eval Loop
 This loop is for tuning the SxCP generator toward stronger Krea2 images.
-ComfyUI sends a generated prompt and image to Codex, Codex analyzes the result,
+ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
-then sends back exactly one edited prompt for the next A/B test. Confirmed
+result, then sends back exactly one edited prompt for the next A/B test.
-findings become either generator changes or durable prompt rules in
+Confirmed findings become either generator changes or durable prompt rules in
 [`krea2-prompt-guide.md`](krea2-prompt-guide.md).
 ## Channels
- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text and image path.
+- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only. Do not put analysis here.
+  seed.
 - `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through
  the MCP signal when supported. Do not put analysis here.
 - `sxcp_eval_log`: optional analysis/log channel.
 ## Manual Loop
@@ -23,12 +25,14 @@ tools/sxcp_eval_loop.sh 3
 Every three minutes it prints a structured request asking Codex to:
 1. Pull `sxcp_eval_in`.
-2. Inspect the image.
+2. Record the emitted seed.
-3. Compare it to the prompt and previous edit.
+3. Inspect the image.
-4. Push one prompt-only edit to `sxcp_eval_out`.
+4. Compare it to the prompt and previous edit.
-5. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
+5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
-6. Change generator code/data only when the issue is systemic.
+   the MCP signal when available.
-7. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
+6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
 7. Change generator code/data only when the issue is systemic.
 8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
 Runtime logs are written under `.sxcp_eval/` and ignored by git.
@@ -60,8 +64,17 @@ The request is sent on stdin. The command also receives:
 - Crop/framing
 - Prompt noise/repetition
 - Model confusion tokens
 - Seed control/reproducibility
 - Overall Krea2 image usefulness
 ## Seed Contract
 The seed is transport metadata, not prompt text. When the graph emits a seed, an
 A/B wording test should reuse that exact seed so the image difference mostly
 comes from wording, not sampling randomness. If a payload has no seed, mark that
 cycle as uncontrolled and avoid turning the result into a durable generator rule
 without another controlled run.
 ## Generator Fix Rule
 Only edit the generator when the image shows a repeatable, systemic prompt
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
 ## Protocol
 1. Pull the latest prompt/image from \`$in_channel\`.
-2. Compare the image against the prompt and previous edited prompt.
+2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
-3. Identify concrete Krea2 mismatches and likely generator path.
+3. Compare the image against the prompt and previous edited prompt.
-4. Classify the next step: prompt-only edit, guide rule, or generator patch.
+4. Identify concrete Krea2 mismatches and likely generator path.
-5. Push only the next test prompt to \`$out_channel\`.
+5. Classify the next step: prompt-only edit, guide rule, or generator patch.
-6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
+6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
-7. Edit generator code/data only when the issue is systemic.
+   seed through the MCP signal when available; never write the seed into the
-8. Update \`$guide_file\` when a wording rule is confirmed.
+   prompt text.
-9. Run focused smoke tests after generator edits.
+7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
 8. Edit generator code/data only when the issue is systemic.
 9. Update \`$guide_file\` when a wording rule is confirmed.
 10. Run focused smoke tests after generator edits.
 ## Cycles
@@ -175,15 +178,16 @@ Channels:
 Evaluation steps:
 1. Pull the latest payload from $in_channel.
-2. Inspect image_path and compare it to the prompt text.
+2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
-3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
+3. Inspect image_path and compare it to the prompt text.
-4. Identify the smallest concrete mismatch that should be tested next.
+4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
-5. Classify the finding:
+5. Identify the smallest concrete mismatch that should be tested next.
-   - prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
+6. Classify the finding:
   - prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
   - guide-rule: update $guide_file with the confirmed Krea2 wording rule.
   - generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
-6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
+7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
-7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
+8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
 Current run:
 - run_id: $run_id