From ef3b9837129032f10373d349ec2972d142fb17c0 Mon Sep 17 00:00:00 2001 From: Ethanfel Date: Sun, 28 Jun 2026 22:56:50 +0200 Subject: [PATCH] Document seed-controlled Krea2 evals --- docs/krea2-prompt-guide.md | 17 +++++++++++++++++ docs/sxcp-eval-loop.md | 35 ++++++++++++++++++++++++----------- tools/sxcp_eval_loop.sh | 34 +++++++++++++++++++--------------- 3 files changed, 60 insertions(+), 26 deletions(-) diff --git a/docs/krea2-prompt-guide.md b/docs/krea2-prompt-guide.md index 7b41cb1..54674ca 100644 --- a/docs/krea2-prompt-guide.md +++ b/docs/krea2-prompt-guide.md @@ -26,6 +26,23 @@ Avoid letting two sections describe incompatible camera or framing intents. - Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`. - Keep one experiment variable per cycle when possible. - Lock seed, character, location, and camera when testing wording changes. +- Treat the MCP seed as transport metadata. Preserve it for prompt-only A/B tests + and do not write it into the visible prompt text. + +## Seed-Controlled A/B Tests + +Use one fixed seed when deciding whether prompt wording helped Krea2. A single +image can justify a prompt-only retry when the mismatch is obvious, but a +generator rule needs either repeated evidence or a generated prompt that is +structurally wrong before rendering. + +When reviewing an eval payload, log: + +- emitted seed, +- original generated prompt, +- edited prompt, +- image failure or improvement, +- whether the change should stay prompt-only or become a generator patch. ## Camera And Composition diff --git a/docs/sxcp-eval-loop.md b/docs/sxcp-eval-loop.md index 1e4e4b1..a0bfdbc 100644 --- a/docs/sxcp-eval-loop.md +++ b/docs/sxcp-eval-loop.md @@ -1,15 +1,17 @@ # SxCP Eval Loop This loop is for tuning the SxCP generator toward stronger Krea2 images. -ComfyUI sends a generated prompt and image to Codex, Codex analyzes the result, -then sends back exactly one edited prompt for the next A/B test. Confirmed -findings become either generator changes or durable prompt rules in +ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the +result, then sends back exactly one edited prompt for the next A/B test. +Confirmed findings become either generator changes or durable prompt rules in [`krea2-prompt-guide.md`](krea2-prompt-guide.md). ## Channels -- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text and image path. -- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only. Do not put analysis here. +- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and + seed. +- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through + the MCP signal when supported. Do not put analysis here. - `sxcp_eval_log`: optional analysis/log channel. ## Manual Loop @@ -23,12 +25,14 @@ tools/sxcp_eval_loop.sh 3 Every three minutes it prints a structured request asking Codex to: 1. Pull `sxcp_eval_in`. -2. Inspect the image. -3. Compare it to the prompt and previous edit. -4. Push one prompt-only edit to `sxcp_eval_out`. -5. Classify the finding as prompt-only, prompt-guide rule, or generator fix. -6. Change generator code/data only when the issue is systemic. -7. Record the finding and update the Krea2 prompt guide when a rule is confirmed. +2. Record the emitted seed. +3. Inspect the image. +4. Compare it to the prompt and previous edit. +5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through + the MCP signal when available. +6. Classify the finding as prompt-only, prompt-guide rule, or generator fix. +7. Change generator code/data only when the issue is systemic. +8. Record the finding and update the Krea2 prompt guide when a rule is confirmed. Runtime logs are written under `.sxcp_eval/` and ignored by git. @@ -60,8 +64,17 @@ The request is sent on stdin. The command also receives: - Crop/framing - Prompt noise/repetition - Model confusion tokens +- Seed control/reproducibility - Overall Krea2 image usefulness +## Seed Contract + +The seed is transport metadata, not prompt text. When the graph emits a seed, an +A/B wording test should reuse that exact seed so the image difference mostly +comes from wording, not sampling randomness. If a payload has no seed, mark that +cycle as uncontrolled and avoid turning the result into a durable generator rule +without another controlled run. + ## Generator Fix Rule Only edit the generator when the image shows a repeatable, systemic prompt diff --git a/tools/sxcp_eval_loop.sh b/tools/sxcp_eval_loop.sh index 7e26ebe..316e9db 100755 --- a/tools/sxcp_eval_loop.sh +++ b/tools/sxcp_eval_loop.sh @@ -135,14 +135,17 @@ style. Every cycle should turn visual evidence into one of: ## Protocol 1. Pull the latest prompt/image from \`$in_channel\`. -2. Compare the image against the prompt and previous edited prompt. -3. Identify concrete Krea2 mismatches and likely generator path. -4. Classify the next step: prompt-only edit, guide rule, or generator patch. -5. Push only the next test prompt to \`$out_channel\`. -6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`. -7. Edit generator code/data only when the issue is systemic. -8. Update \`$guide_file\` when a wording rule is confirmed. -9. Run focused smoke tests after generator edits. +2. Record the emitted seed. If it is missing, mark the image as uncontrolled. +3. Compare the image against the prompt and previous edited prompt. +4. Identify concrete Krea2 mismatches and likely generator path. +5. Classify the next step: prompt-only edit, guide rule, or generator patch. +6. Push only the next test prompt text to \`$out_channel\`. Preserve the same + seed through the MCP signal when available; never write the seed into the + prompt text. +7. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`. +8. Edit generator code/data only when the issue is systemic. +9. Update \`$guide_file\` when a wording rule is confirmed. +10. Run focused smoke tests after generator edits. ## Cycles @@ -175,15 +178,16 @@ Channels: Evaluation steps: 1. Pull the latest payload from $in_channel. -2. Inspect image_path and compare it to the prompt text. -3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness. -4. Identify the smallest concrete mismatch that should be tested next. -5. Classify the finding: - - prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel. +2. Record payload.seed if present. Keep the same seed for prompt-only A/B tests. +3. Inspect image_path and compare it to the prompt text. +4. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, seed control, and overall image usefulness. +5. Identify the smallest concrete mismatch that should be tested next. +6. Classify the finding: + - prompt-only: push exactly one edited prompt to $out_channel and preserve payload.seed through the MCP signal when the tool supports it. - guide-rule: update $guide_file with the confirmed Krea2 wording rule. - generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change. -6. Keep a clear link between the image evidence, the prompt wording, and the generator path. -7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis. +7. Keep a clear link between the image evidence, seed, prompt wording, and generator path. +8. Append the finding to the eval log with: seed, original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis. Current run: - run_id: $run_id