Document seed-controlled Krea2 evals
This commit is contained in:
+19
-15
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
|
||||
## Protocol
|
||||
|
||||
1. Pull the latest prompt/image from \`$in_channel\`.
|
||||
2. Compare the image against the prompt and previous edited prompt.
|
||||
3. Identify concrete Krea2 mismatches and likely generator path.
|
||||
4. Classify the next step: prompt-only edit, guide rule, or generator patch.
|
||||
5. Push only the next test prompt to \`$out_channel\`.
|
||||
6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
|
||||
7. Edit generator code/data only when the issue is systemic.
|
||||
8. Update \`$guide_file\` when a wording rule is confirmed.
|
||||
9. Run focused smoke tests after generator edits.
|
||||
2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
|
||||
3. Compare the image against the prompt and previous edited prompt.
|
||||
4. Identify concrete Krea2 mismatches and likely generator path.
|
||||
5. Classify the next step: prompt-only edit, guide rule, or generator patch.
|
||||
6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
|
||||
seed through the MCP signal when available; never write the seed into the
|
||||
prompt text.
|
||||
7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
|
||||
8. Edit generator code/data only when the issue is systemic.
|
||||
9. Update \`$guide_file\` when a wording rule is confirmed.
|
||||
10. Run focused smoke tests after generator edits.
|
||||
|
||||
## Cycles
|
||||
|
||||
@@ -175,15 +178,16 @@ Channels:
|
||||
|
||||
Evaluation steps:
|
||||
1. Pull the latest payload from $in_channel.
|
||||
2. Inspect image_path and compare it to the prompt text.
|
||||
3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
|
||||
4. Identify the smallest concrete mismatch that should be tested next.
|
||||
5. Classify the finding:
|
||||
- prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
|
||||
2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
|
||||
3. Inspect image_path and compare it to the prompt text.
|
||||
4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
|
||||
5. Identify the smallest concrete mismatch that should be tested next.
|
||||
6. Classify the finding:
|
||||
- prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
|
||||
- guide-rule: update $guide_file with the confirmed Krea2 wording rule.
|
||||
- generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
|
||||
6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
|
||||
7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
|
||||
7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
|
||||
8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
|
||||
|
||||
Current run:
|
||||
- run_id: $run_id
|
||||
|
||||
Reference in New Issue
Block a user