Document seed-controlled Krea2 evals

This commit is contained in:
2026-06-28 22:56:50 +02:00
parent 0328e5ca3a
commit ef3b983712
3 changed files with 60 additions and 26 deletions
+17
View File
@@ -26,6 +26,23 @@ Avoid letting two sections describe incompatible camera or framing intents.
- Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`. - Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`.
- Keep one experiment variable per cycle when possible. - Keep one experiment variable per cycle when possible.
- Lock seed, character, location, and camera when testing wording changes. - Lock seed, character, location, and camera when testing wording changes.
- Treat the MCP seed as transport metadata. Preserve it for prompt-only A/B tests
and do not write it into the visible prompt text.
## Seed-Controlled A/B Tests
Use one fixed seed when deciding whether prompt wording helped Krea2. A single
image can justify a prompt-only retry when the mismatch is obvious, but a
generator rule needs either repeated evidence or a generated prompt that is
structurally wrong before rendering.
When reviewing an eval payload, log:
- emitted seed,
- original generated prompt,
- edited prompt,
- image failure or improvement,
- whether the change should stay prompt-only or become a generator patch.
## Camera And Composition ## Camera And Composition
+24 -11
View File
@@ -1,15 +1,17 @@
# SxCP Eval Loop # SxCP Eval Loop
This loop is for tuning the SxCP generator toward stronger Krea2 images. This loop is for tuning the SxCP generator toward stronger Krea2 images.
ComfyUI sends a generated prompt and image to Codex, Codex analyzes the result, ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
then sends back exactly one edited prompt for the next A/B test. Confirmed result, then sends back exactly one edited prompt for the next A/B test.
findings become either generator changes or durable prompt rules in Confirmed findings become either generator changes or durable prompt rules in
[`krea2-prompt-guide.md`](krea2-prompt-guide.md). [`krea2-prompt-guide.md`](krea2-prompt-guide.md).
## Channels ## Channels
- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text and image path. - `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only. Do not put analysis here. seed.
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through
the MCP signal when supported. Do not put analysis here.
- `sxcp_eval_log`: optional analysis/log channel. - `sxcp_eval_log`: optional analysis/log channel.
## Manual Loop ## Manual Loop
@@ -23,12 +25,14 @@ tools/sxcp_eval_loop.sh 3
Every three minutes it prints a structured request asking Codex to: Every three minutes it prints a structured request asking Codex to:
1. Pull `sxcp_eval_in`. 1. Pull `sxcp_eval_in`.
2. Inspect the image. 2. Record the emitted seed.
3. Compare it to the prompt and previous edit. 3. Inspect the image.
4. Push one prompt-only edit to `sxcp_eval_out`. 4. Compare it to the prompt and previous edit.
5. Classify the finding as prompt-only, prompt-guide rule, or generator fix. 5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
6. Change generator code/data only when the issue is systemic. the MCP signal when available.
7. Record the finding and update the Krea2 prompt guide when a rule is confirmed. 6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
7. Change generator code/data only when the issue is systemic.
8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
Runtime logs are written under `.sxcp_eval/` and ignored by git. Runtime logs are written under `.sxcp_eval/` and ignored by git.
@@ -60,8 +64,17 @@ The request is sent on stdin. The command also receives:
- Crop/framing - Crop/framing
- Prompt noise/repetition - Prompt noise/repetition
- Model confusion tokens - Model confusion tokens
- Seed control/reproducibility
- Overall Krea2 image usefulness - Overall Krea2 image usefulness
## Seed Contract
The seed is transport metadata, not prompt text. When the graph emits a seed, an
A/B wording test should reuse that exact seed so the image difference mostly
comes from wording, not sampling randomness. If a payload has no seed, mark that
cycle as uncontrolled and avoid turning the result into a durable generator rule
without another controlled run.
## Generator Fix Rule ## Generator Fix Rule
Only edit the generator when the image shows a repeatable, systemic prompt Only edit the generator when the image shows a repeatable, systemic prompt
+19 -15
View File
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
## Protocol ## Protocol
1. Pull the latest prompt/image from \`$in_channel\`. 1. Pull the latest prompt/image from \`$in_channel\`.
2. Compare the image against the prompt and previous edited prompt. 2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
3. Identify concrete Krea2 mismatches and likely generator path. 3. Compare the image against the prompt and previous edited prompt.
4. Classify the next step: prompt-only edit, guide rule, or generator patch. 4. Identify concrete Krea2 mismatches and likely generator path.
5. Push only the next test prompt to \`$out_channel\`. 5. Classify the next step: prompt-only edit, guide rule, or generator patch.
6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`. 6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
7. Edit generator code/data only when the issue is systemic. seed through the MCP signal when available; never write the seed into the
8. Update \`$guide_file\` when a wording rule is confirmed. prompt text.
9. Run focused smoke tests after generator edits. 7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
8. Edit generator code/data only when the issue is systemic.
9. Update \`$guide_file\` when a wording rule is confirmed.
10. Run focused smoke tests after generator edits.
## Cycles ## Cycles
@@ -175,15 +178,16 @@ Channels:
Evaluation steps: Evaluation steps:
1. Pull the latest payload from $in_channel. 1. Pull the latest payload from $in_channel.
2. Inspect image_path and compare it to the prompt text. 2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness. 3. Inspect image_path and compare it to the prompt text.
4. Identify the smallest concrete mismatch that should be tested next. 4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
5. Classify the finding: 5. Identify the smallest concrete mismatch that should be tested next.
- prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel. 6. Classify the finding:
- prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
- guide-rule: update $guide_file with the confirmed Krea2 wording rule. - guide-rule: update $guide_file with the confirmed Krea2 wording rule.
- generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change. - generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
6. Keep a clear link between the image evidence, the prompt wording, and the generator path. 7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis. 8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
Current run: Current run:
- run_id: $run_id - run_id: $run_id