Document seed-controlled Krea2 evals

2026-06-28 22:56:50 +02:00
parent 0328e5ca3a
commit ef3b983712
3 changed files with 60 additions and 26 deletions
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
 ## Protocol

 1. Pull the latest prompt/image from \`$in_channel\`.
-2. Compare the image against the prompt and previous edited prompt.
-3. Identify concrete Krea2 mismatches and likely generator path.
-4. Classify the next step: prompt-only edit, guide rule, or generator patch.
-5. Push only the next test prompt to \`$out_channel\`.
-6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
-7. Edit generator code/data only when the issue is systemic.
-8. Update \`$guide_file\` when a wording rule is confirmed.
-9. Run focused smoke tests after generator edits.
+2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
+3. Compare the image against the prompt and previous edited prompt.
+4. Identify concrete Krea2 mismatches and likely generator path.
+5. Classify the next step: prompt-only edit, guide rule, or generator patch.
+6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
+   seed through the MCP signal when available; never write the seed into the
+   prompt text.
+7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
+8. Edit generator code/data only when the issue is systemic.
+9. Update \`$guide_file\` when a wording rule is confirmed.
+10. Run focused smoke tests after generator edits.

 ## Cycles

@@ -175,15 +178,16 @@ Channels:

 Evaluation steps:
 1. Pull the latest payload from $in_channel.
-2. Inspect image_path and compare it to the prompt text.
-3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
-4. Identify the smallest concrete mismatch that should be tested next.
-5. Classify the finding:
-   - prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
+2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
+3. Inspect image_path and compare it to the prompt text.
+4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
+5. Identify the smallest concrete mismatch that should be tested next.
+6. Classify the finding:
+   - prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
   - guide-rule: update $guide_file with the confirmed Krea2 wording rule.
   - generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
-6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
-7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
+7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
+8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.

 Current run:
 - run_id: $run_id