Document seed-controlled Krea2 evals

This commit is contained in:
2026-06-28 22:56:50 +02:00
parent 0328e5ca3a
commit ef3b983712
3 changed files with 60 additions and 26 deletions
+17
View File
@@ -26,6 +26,23 @@ Avoid letting two sections describe incompatible camera or framing intents.
- Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`.
- Keep one experiment variable per cycle when possible.
- Lock seed, character, location, and camera when testing wording changes.
- Treat the MCP seed as transport metadata. Preserve it for prompt-only A/B tests
and do not write it into the visible prompt text.
## Seed-Controlled A/B Tests
Use one fixed seed when deciding whether prompt wording helped Krea2. A single
image can justify a prompt-only retry when the mismatch is obvious, but a
generator rule needs either repeated evidence or a generated prompt that is
structurally wrong before rendering.
When reviewing an eval payload, log:
- emitted seed,
- original generated prompt,
- edited prompt,
- image failure or improvement,
- whether the change should stay prompt-only or become a generator patch.
## Camera And Composition
+24 -11
View File
@@ -1,15 +1,17 @@
# SxCP Eval Loop
This loop is for tuning the SxCP generator toward stronger Krea2 images.
ComfyUI sends a generated prompt and image to Codex, Codex analyzes the result,
then sends back exactly one edited prompt for the next A/B test. Confirmed
findings become either generator changes or durable prompt rules in
ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
result, then sends back exactly one edited prompt for the next A/B test.
Confirmed findings become either generator changes or durable prompt rules in
[`krea2-prompt-guide.md`](krea2-prompt-guide.md).
## Channels
- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text and image path.
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only. Do not put analysis here.
- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and
seed.
- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through
the MCP signal when supported. Do not put analysis here.
- `sxcp_eval_log`: optional analysis/log channel.
## Manual Loop
@@ -23,12 +25,14 @@ tools/sxcp_eval_loop.sh 3
Every three minutes it prints a structured request asking Codex to:
1. Pull `sxcp_eval_in`.
2. Inspect the image.
3. Compare it to the prompt and previous edit.
4. Push one prompt-only edit to `sxcp_eval_out`.
5. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
6. Change generator code/data only when the issue is systemic.
7. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
2. Record the emitted seed.
3. Inspect the image.
4. Compare it to the prompt and previous edit.
5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
the MCP signal when available.
6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
7. Change generator code/data only when the issue is systemic.
8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
Runtime logs are written under `.sxcp_eval/` and ignored by git.
@@ -60,8 +64,17 @@ The request is sent on stdin. The command also receives:
- Crop/framing
- Prompt noise/repetition
- Model confusion tokens
- Seed control/reproducibility
- Overall Krea2 image usefulness
## Seed Contract
The seed is transport metadata, not prompt text. When the graph emits a seed, an
A/B wording test should reuse that exact seed so the image difference mostly
comes from wording, not sampling randomness. If a payload has no seed, mark that
cycle as uncontrolled and avoid turning the result into a durable generator rule
without another controlled run.
## Generator Fix Rule
Only edit the generator when the image shows a repeatable, systemic prompt
+19 -15
View File
@@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of:
## Protocol
1. Pull the latest prompt/image from \`$in_channel\`.
2. Compare the image against the prompt and previous edited prompt.
3. Identify concrete Krea2 mismatches and likely generator path.
4. Classify the next step: prompt-only edit, guide rule, or generator patch.
5. Push only the next test prompt to \`$out_channel\`.
6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
7. Edit generator code/data only when the issue is systemic.
8. Update \`$guide_file\` when a wording rule is confirmed.
9. Run focused smoke tests after generator edits.
2. Record the emitted seed. If it is missing, mark the image as uncontrolled.
3. Compare the image against the prompt and previous edited prompt.
4. Identify concrete Krea2 mismatches and likely generator path.
5. Classify the next step: prompt-only edit, guide rule, or generator patch.
6. Push only the next test prompt text to \`$out_channel\`. Preserve the same
seed through the MCP signal when available; never write the seed into the
prompt text.
7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
8. Edit generator code/data only when the issue is systemic.
9. Update \`$guide_file\` when a wording rule is confirmed.
10. Run focused smoke tests after generator edits.
## Cycles
@@ -175,15 +178,16 @@ Channels:
Evaluation steps:
1. Pull the latest payload from $in_channel.
2. Inspect image_path and compare it to the prompt text.
3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
4. Identify the smallest concrete mismatch that should be tested next.
5. Classify the finding:
- prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests.
3. Inspect image_path and compare it to the prompt text.
4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness.
5. Identify the smallest concrete mismatch that should be tested next.
6. Classify the finding:
- prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it.
- guide-rule: update $guide_file with the confirmed Krea2 wording rule.
- generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
7. Keep a clear link between the image evidence, seed, prompt wording, and generator path.
8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
Current run:
- run_id: $run_id