Document seed-controlled Krea2 evals

This commit is contained in:
2026-06-28 22:56:50 +02:00
parent 0328e5ca3a
commit ef3b983712
3 changed files with 60 additions and 26 deletions
+19 -15
View File
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
## Protocol
1. Pull the latest prompt/image from \`$in_channel\`.
2. Compare the image against the prompt and previous edited prompt.
3. Identify concrete Krea2 mismatches and likely generator path.
4. Classify the next step: prompt-only edit, guide rule, or generator patch.
5. Push only the next test prompt to \`$out_channel\`.
6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
7. Edit generator code/data only when the issue is systemic.
8. Update \`$guide_file\` when a wording rule is confirmed.
9. Run focused smoke tests after generator edits.
2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
3. Compare the image against the prompt and previous edited prompt.
4. Identify concrete Krea2 mismatches and likely generator path.
5. Classify the next step: prompt-only edit, guide rule, or generator patch.
6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
seed through the MCP signal when available; never write the seed into the
prompt text.
7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
8. Edit generator code/data only when the issue is systemic.
9. Update \`$guide_file\` when a wording rule is confirmed.
10. Run focused smoke tests after generator edits.
## Cycles
@@ -175,15 +178,16 @@ Channels:
Evaluation steps:
1. Pull the latest payload from $in_channel.
2. Inspect image_path and compare it to the prompt text.
3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
4. Identify the smallest concrete mismatch that should be tested next.
5. Classify the finding:
- prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
3. Inspect image_path and compare it to the prompt text.
4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
5. Identify the smallest concrete mismatch that should be tested next.
6. Classify the finding:
- prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
- guide-rule: update $guide_file with the confirmed Krea2 wording rule.
- generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
Current run:
- run_id: $run_id