Files
ComfyUI-Ethanfel-Prompt-Bui…/docs/sxcp-eval-loop.md
T
2026-06-29 03:55:17 +02:00

4.6 KiB

SxCP Eval Loop

This loop is for tuning the SxCP generator toward stronger Krea2 images. ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the result, then sends back exactly one edited prompt for the next A/B test. Confirmed findings become either generator changes or durable prompt rules in krea2-prompt-guide.md.

Channels

  • sxcp_eval_in: ComfyUI to Codex. Contains the prompt text, image path, and seed.
  • sxcp_eval_out: Codex to ComfyUI. Prompt-only text plus the same seed through the MCP signal when supported. Do not put analysis here.
  • sxcp_eval_log: optional analysis/log channel.

Manual Loop

Start the helper after sending a test prompt:

tools/sxcp_eval_loop.sh 3

Every three minutes it prints a structured request asking Codex to:

  1. Pull sxcp_eval_in.
  2. Record the emitted seed.
  3. Inspect the image.
  4. Compare it to the prompt and previous edit.
  5. Push one prompt-only edit to sxcp_eval_out, preserving the same seed through the MCP signal when available.
  6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
  7. Change generator code/data only when the issue is systemic.
  8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.

Runtime logs are written under .sxcp_eval/ and ignored by git.

Durable fixed-seed findings that justify a guide rule, generator patch, or pose variant promotion are recorded in krea2-eval-log.json. Use runtime logs for scratch notes; use the JSON log only for evidence that should remain tied to a catalog variant. Image paths in that log point at external ComfyUI artifacts and may be cleaned; the durable evidence is the fixed seed, prompt summaries, observation, decision, and commit.

To see catalog coverage and the next variants that still need controlled testing, run:

python tools/krea2_tuning_report.py

The report includes atlas references plus prompt cues and avoid cues for the next fixed-seed test candidate.

Optional Command Hook

If you have a one-shot Codex command you want to run automatically, set:

SXCP_EVAL_CODEX_CMD="codex exec" tools/sxcp_eval_loop.sh 3

The request is sent on stdin. The command also receives:

  • SXCP_EVAL_IN_CHANNEL
  • SXCP_EVAL_OUT_CHANNEL
  • SXCP_EVAL_LOG_CHANNEL
  • SXCP_EVAL_GUIDE_FILE
  • SXCP_EVAL_REQUEST_FILE
  • SXCP_EVAL_CYCLE_DIR
  • SXCP_EVAL_CYCLE

Evaluation Axes

  • Identity consistency
  • Outfit continuity
  • Pose/action accuracy
  • Camera compliance
  • Location coherence
  • Crop/framing
  • Prompt noise/repetition
  • Model confusion tokens
  • Seed control/reproducibility
  • Overall Krea2 image usefulness

POV Pose Atlas

Use /media/unraid/davinci/Qwen_edit_lora/POV/dataset_v2 as the local reference atlas for POV pose geometry. The top-level pose folders contain real POV examples, and matching _control folders contain solo/control versions. Ignore bg and *_bg folders for pose rules; they are background plates without people. Treat the pose image folders as the primary source for body geometry; captions are optional and are not present for every folder.

Suggested workflow:

  1. Choose one pose family, for example doggy, doggy_alt, cowgirl, or missionary.
  2. Sample 5-10 real pose images and their control images.
  3. Write the repeated geometry as a compact prompt rule.
  4. Run one fixed-seed Krea2 prompt using that rule.
  5. Repeat on a second seed or character before changing generator defaults.
  6. If the prompt itself is structurally contradictory before rendering, patch immediately and add a regression test.

For POV doggy, the atlas shows that visible viewer thighs, lower torso, or pelvis can be correct. Do not treat them as automatic failures.

Seed Contract

The seed is transport metadata, not prompt text. When the graph emits a seed, an A/B wording test should reuse that exact seed so the image difference mostly comes from wording, not sampling randomness. If a payload has no seed, mark that cycle as uncontrolled and avoid turning the result into a durable generator rule without another controlled run.

Generator Fix Rule

Only edit the generator when the image shows a repeatable, systemic prompt failure. Examples:

  • Selfie wording overrides orbit camera.
  • Clothing continuity loses the selected softcore outfit.
  • POV wording makes the off-camera participant the visual subject.
  • Location camera layout inserts foreground anchors in the wrong place.

For one-off model drift, send a cleaner prompt to sxcp_eval_out and keep the generator unchanged. For repeated prompt behavior, update the generator and add the rule to docs/krea2-prompt-guide.md.