# SxCP Eval Loop This loop is for tuning the SxCP generator toward stronger Krea2 images. ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the result, then sends back exactly one edited prompt for the next A/B test. Confirmed findings become either generator changes or durable prompt rules in [`krea2-prompt-guide.md`](krea2-prompt-guide.md). The active A/B testing method is recorded in [`krea2-ab-methodology.md`](krea2-ab-methodology.md); update that memory when the method improves. ## Channels - `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text, image path, and seed. - `sxcp_eval_out`: Codex to ComfyUI. Prompt-only text plus the same seed through the MCP signal when supported. Do not put analysis here. - `sxcp_eval_log`: optional analysis/log channel. ## MCP Helper Command Use the checked helper for bridge calls instead of ad hoc Python snippets. The approved command prefix is: ```bash /media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py ``` Common calls: ```bash /media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py list-tools /media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_pull --arguments-json '{"channel":"sxcp_eval_in"}' /media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_push --arguments-json '{"channel":"sxcp_eval_out","seed":5656565656,"text":"PROMPT_ONLY_POSITIVE_CONDITIONING"}' ``` ## Batch Prompt Helper For prompt-axis batches, prepare a local JSON file and use the offline helper to render the approved MCP push/pull commands and an image-presence checklist: ```bash python tools/sxcp_prompt_batch.py validate --batch-json /tmp/sxcp-batch.json python tools/sxcp_prompt_batch.py print-push-commands --batch-json /tmp/sxcp-batch.json python tools/sxcp_prompt_batch.py print-result-template --batch-json /tmp/sxcp-batch.json python tools/sxcp_prompt_batch.py run-batch --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --previous-turn 80 --run python tools/sxcp_prompt_batch.py validate-results --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json python tools/sxcp_prompt_batch.py print-eval-entry-draft --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --variant-key pov_example_variant --baseline-image /absolute/baseline.png --candidate-id controlled_subject_first ``` Batch files use the fixed sampler seed and one positive prompt per probe: ```json { "seed": 8989898989, "channel_out": "sxcp_eval_out", "channel_in": "sxcp_eval_in", "probes": [ { "id": "controlled_subject_first", "prompt_order": "subject_first", "text": "SUBJECT_LOOK_FIRST. POSE_HIERARCHY. LOCATION_ANCHORS." }, { "id": "rough_geometry_axis", "prompt_order": "geometry_only", "text": "POSE_AXIS_ONLY_FOR_DISCOVERY." } ] } ``` `geometry_only` probes are for rough pose-axis discovery and are not durable subject/look-controlled A/B evidence. The helper rejects `sxcp_eval_negative_out`; keep batch prompts positive-only. Use `run-batch --run` to push one positive prompt, poll `sxcp_eval_in` until a new turn and absolute PNG image path appear with the fixed sampler seed, write the filled result JSON, then send the next probe. Omit `--run` for a dry-run command preview. After a live run, run `validate-results`; it requires the result probe ids to match the batch order, each turn to advance in batch order, every image path to be an absolute PNG artifact, and every returned seed to match the fixed sampler seed. Then use `print-eval-entry-draft` to create a valid `krea2-eval-log.json` entry draft. Replace the generated summaries and observation with the real visual comparison before recording it with `tools/krea2_record_eval.py`. By default the draft command rejects `geometry_only` candidates; pass `--allow-geometry-only` only when deliberately recording non-controlled prompt-axis evidence. ## Manual Loop Start the helper after sending a test prompt: ```bash tools/sxcp_eval_loop.sh 3 ``` Every three minutes it prints a structured request asking Codex to: 1. Pull `sxcp_eval_in`. 2. Record the emitted seed. 3. Inspect the image. 4. Compare it to the prompt and previous edit. 5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through the MCP signal when available. 6. Classify the finding as prompt-only, prompt-guide rule, provisional generator improvement, or proven generator fix. 7. When leaving a category after same-seed progress over baseline, mirror the best generator-safe wording into the responsible generator path as `provisional_generator_patch`. 8. Promote a generator change to proven only when the issue is systemic, repeated, or structurally wrong before rendering. 9. Record the finding and update the Krea2 prompt guide when a rule is confirmed. Runtime logs are written under `.sxcp_eval/` and ignored by git. Durable fixed-seed findings that justify a guide rule, generator patch, or pose variant promotion are recorded in [`krea2-eval-log.json`](krea2-eval-log.json). Method changes belong in [`krea2-ab-methodology.md`](krea2-ab-methodology.md). Use runtime logs for scratch notes; use the JSON log only for evidence that should remain tied to a catalog variant. Image paths in that log point at external ComfyUI artifacts and may be cleaned; the durable evidence is the fixed sampler seed, optional generator seed, prompt summaries, observation, decision, and commit. Record durable findings with the checked helper instead of hand-editing the log: ```bash python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 --generator-seed 5678 > /tmp/krea2-entry.json python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json --dry-run python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json ``` Entry template: ```json { "id": "variant-seed-short-finding", "date": "2026-06-29", "variant_key": "pov_example_variant", "seed": 1234, "generator_seed": 5678, "source": "sxcp_eval_mcp", "result": "accepted", "decision": "generator_patch", "baseline_prompt_summary": "What the generated prompt did before the edit.", "candidate_prompt_summary": "What the edited prompt changed for the same seed.", "observation": "What the image comparison proved and why it matters for the generator or guide.", "baseline_image": "/absolute/path/to/baseline.png", "candidate_image": "/absolute/path/to/candidate.png", "commit": "pending" } ``` To see catalog coverage and the next variants that still need controlled testing, run: ```bash python tools/krea2_tuning_report.py ``` The report includes atlas references plus prompt cues and avoid cues for the next fixed-seed test candidate. It also shows the latest durable evidence for variants that already have fixed-seed results, including the evidence id, seed, decision, candidate prompt summary, and observation. For each normal next-test candidate, it prints a `krea2_record_eval.py --print-template` command; replace `` with the seed from the run you are recording. ## Optional Command Hook If you have a one-shot Codex command you want to run automatically, set: ```bash SXCP_EVAL_CODEX_CMD="codex exec" tools/sxcp_eval_loop.sh 3 ``` The request is sent on stdin. The command also receives: - `SXCP_EVAL_IN_CHANNEL` - `SXCP_EVAL_OUT_CHANNEL` - `SXCP_EVAL_LOG_CHANNEL` - `SXCP_EVAL_GUIDE_FILE` - `SXCP_EVAL_REQUEST_FILE` - `SXCP_EVAL_CYCLE_DIR` - `SXCP_EVAL_CYCLE` ## Evaluation Axes - Identity consistency - Outfit continuity - Pose/action accuracy - Camera compliance - Location coherence - Crop/framing - Prompt noise/repetition - Model confusion tokens - Seed control/reproducibility - Overall Krea2 image usefulness ## POV Pose Atlas Use `/media/unraid/davinci/Qwen_edit_lora/POV/dataset_v2` as the local reference atlas for POV pose geometry. The top-level pose folders contain real POV examples, and matching `_control` folders contain solo/control versions. Ignore `bg` and `*_bg` folders for pose rules; they are background plates without people. Treat the pose image folders as the primary source for body geometry; captions are optional and are not present for every folder. Suggested workflow: 1. Choose one pose family, for example `doggy`, `doggy_alt`, `cowgirl`, or `missionary`. 2. Sample 5-10 real pose images and their control images. 3. Write the repeated geometry as a compact prompt rule. 4. Run one fixed-seed Krea2 prompt using that rule. 5. Repeat on a second seed or character before changing generator defaults. 6. If the prompt itself is structurally contradictory before rendering, patch immediately and add a regression test. For POV doggy, the atlas shows that visible viewer thighs, lower torso, or pelvis can be correct. Do not treat them as automatic failures. ## Seed Contract The sampler seed is transport metadata, not prompt text. When the graph emits a sampler seed, an A/B wording test should reuse that exact seed so the image difference mostly comes from wording, not sampling randomness. If the SxCP generator/control seed differs from the sampler seed, record it as `generator_seed` in the eval entry. If a payload has no sampler seed, mark that cycle as uncontrolled and avoid turning the result into a durable generator rule without another controlled run. ## Positive-Only Conditioning `sxcp_eval_out` is positive conditioning only. Never send negative-conditioning phrases such as `no shaft`, `no hands`, `without clothing`, or `avoid X` inside the positive prompt; distilled Krea2 can reinforce or hallucinate the unwanted object from that wording. This loop has no active negative-output channel. A same-positive, same-seed probe on seed `424242` compared empty negative conditioning against strong negative text targeting visible prompt attributes, and the rendered image stayed visually unchanged. Do not rely on negative conditioning for Krea2 pose tuning; keep prompt fixes positive-only. ## Generator Fix Rule Use two levels of generator change: - `provisional_generator_patch`: apply the best generator-safe wording when leaving a category after fixed-seed progress over baseline. Keep the catalog variant as `candidate`. - `generator_patch`: promote as a proven/default generator rule when the issue is repeated, systemic, or structurally wrong before rendering. Examples of proven generator fixes: - Selfie wording overrides orbit camera. - Clothing continuity loses the selected softcore outfit. - POV wording makes the off-camera participant the visual subject. - Location camera layout inserts foreground anchors in the wrong place. For one-off model drift inside an active category, send a cleaner prompt to `sxcp_eval_out` and keep collecting evidence. When exiting a category, carry forward same-seed improvements over baseline as provisional generator changes and add the rule or weak case to `docs/krea2-prompt-guide.md`.