Add Krea2 evaluation loop

2026-06-28 20:07:31 +02:00
parent 54617e4702
commit 0328e5ca3a
4 changed files with 456 additions and 0 deletions
@@ -2,3 +2,4 @@ __pycache__/
 *.py[cod]
 .pytest_cache/
 .ruff_cache/
+.sxcp_eval/
@@ -0,0 +1,150 @@
+# Krea2 Prompt Guide
+
+This document records prompt rules discovered from actual SxCP generator
+outputs tested in Krea2. It is not a generic prompt cookbook. Add a rule only
+when an A/B image comparison shows that the wording improves or breaks Krea2
+behavior.
+
+## Core Rule
+
+Krea2 responds best when the prompt gives one clear visual hierarchy:
+
+1. subject/cast descriptor,
+2. action or pose,
+3. clothing state,
+4. location,
+5. camera/layout,
+6. expression,
+7. composition/crop,
+8. style.
+
+Avoid letting two sections describe incompatible camera or framing intents.
+
+## Prompt Output Contract
+
+- `sxcp_eval_out` must contain only the prompt being tested.
+- Analysis, scoring, and generator notes belong in chat or `sxcp_eval_log`.
+- Keep one experiment variable per cycle when possible.
+- Lock seed, character, location, and camera when testing wording changes.
+
+## Camera And Composition
+
+### Orbit / Multiangle Camera
+
+When Krea2 receives an orbit or multiangle camera, avoid selfie-specific wording
+unless the intended camera is actually a handheld or mirror selfie.
+
+Works better:
+
+- `lifestyle portrait frame`
+- `creator portrait frame`
+- `outfit-check pose`
+- `wide environmental coworking camera layout`
+- `camera placed several meters away`
+- `full seated body from head to knees`
+- `room depth surrounding the subject`
+
+Conflicting wording:
+
+- `selfie frame`
+- `phone selfie`
+- `holding the phone`
+- `creator-shot phone photo`
+- `handheld camera realism`
+
+Observed result: selfie words pulled a back-right elevated wide shot into an
+arm-length selfie. Removing selfie terms made the image follow the rear-quarter
+view much better.
+
+### Wide Shots
+
+Krea2 tends to keep attractive subjects large in frame. To get a real wide or
+environmental frame, be explicit about distance and visible environment.
+
+Useful phrasing:
+
+- `camera placed several meters away across the desk aisle`
+- `full seated body from head to knees remains visible`
+- `nearby desk edge, laptop corner, repeated desk rows, and tall-window depth clearly readable`
+- `wide environmental room framing`
+
+Avoid relying on `wide shot` alone.
+
+## Location Layout
+
+Location-aware camera text works when it describes the room around the subject
+without stealing the foreground from the subject.
+
+For coworking lounge:
+
+- Keep `warm desks`, `laptop tables`, `glass partition seams`, `repeated desk rows`,
+  `plants`, and `tall windows`.
+- Mention foreground anchors only when the camera should actually see them.
+- In POV, keep location anchors beside or behind the bodies, not in the lower
+  foreground.
+
+## Clothing Continuity
+
+When a softcore outfit is reused in a later branch, name what happens to actual
+outfit pieces instead of using generic fabric language.
+
+Works better:
+
+- `denim shorts are pulled aside or removed below the hips`
+- `button-down shirt tied at the waist and fitted bralette remain visible from the same outfit`
+
+Avoid generic fallback wording:
+
+- `fabric slipping off`
+- `partly exposed`
+- `outfit pushed aside where needed`
+
+Use generic wording only when no source outfit exists.
+
+## POV
+
+In POV prompts, the visible subject should still be established first. The POV
+participant is the camera viewpoint, not a normal visible cast member.
+
+Works better:
+
+- visible subject descriptor first,
+- then POV action,
+- then foreground hands/body/clothing cues.
+
+For POV clothing, describe only visible body/clothing fragments:
+
+- `foreground hands, hips, thighs, or lowered waistband`
+- `foreground hands, forearms, sleeves, or torso edge`
+
+Avoid:
+
+- full third-person `Man A wears...` phrasing for the POV participant,
+- making `the viewer` the first subject before the visible character is
+  established.
+
+## Style
+
+Style should describe rendering, not camera mechanics.
+
+Use style presets to choose between:
+
+- natural photo,
+- creator/social-media photo,
+- documentary/direct-flash photo,
+- cinematic realism,
+- illustration/comic.
+
+If a controlled camera is active, avoid style suffixes that imply a conflicting
+camera such as `phone photo` or `handheld selfie`.
+
+## Guide Update Format
+
+When adding a new rule, include:
+
+- observed prompt,
+- observed image failure,
+- edited prompt wording,
+- image improvement or regression,
+- generator path if known,
+- final rule.
@@ -0,0 +1,77 @@
+# SxCP Eval Loop
+
+This loop is for tuning the SxCP generator toward stronger Krea2 images.
+ComfyUI sends a generated prompt and image to Codex, Codex analyzes the result,
+then sends back exactly one edited prompt for the next A/B test. Confirmed
+findings become either generator changes or durable prompt rules in
+[`krea2-prompt-guide.md`](krea2-prompt-guide.md).
+
+## Channels
+
+- `sxcp_eval_in`: ComfyUI to Codex. Contains the prompt text and image path.
+- `sxcp_eval_out`: Codex to ComfyUI. Prompt-only. Do not put analysis here.
+- `sxcp_eval_log`: optional analysis/log channel.
+
+## Manual Loop
+
+Start the helper after sending a test prompt:
+
+```bash
+tools/sxcp_eval_loop.sh 3
+```
+
+Every three minutes it prints a structured request asking Codex to:
+
+1. Pull `sxcp_eval_in`.
+2. Inspect the image.
+3. Compare it to the prompt and previous edit.
+4. Push one prompt-only edit to `sxcp_eval_out`.
+5. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
+6. Change generator code/data only when the issue is systemic.
+7. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
+
+Runtime logs are written under `.sxcp_eval/` and ignored by git.
+
+## Optional Command Hook
+
+If you have a one-shot Codex command you want to run automatically, set:
+
+```bash
+SXCP_EVAL_CODEX_CMD="codex exec" tools/sxcp_eval_loop.sh 3
+```
+
+The request is sent on stdin. The command also receives:
+
+- `SXCP_EVAL_IN_CHANNEL`
+- `SXCP_EVAL_OUT_CHANNEL`
+- `SXCP_EVAL_LOG_CHANNEL`
+- `SXCP_EVAL_GUIDE_FILE`
+- `SXCP_EVAL_REQUEST_FILE`
+- `SXCP_EVAL_CYCLE_DIR`
+- `SXCP_EVAL_CYCLE`
+
+## Evaluation Axes
+
+- Identity consistency
+- Outfit continuity
+- Pose/action accuracy
+- Camera compliance
+- Location coherence
+- Crop/framing
+- Prompt noise/repetition
+- Model confusion tokens
+- Overall Krea2 image usefulness
+
+## Generator Fix Rule
+
+Only edit the generator when the image shows a repeatable, systemic prompt
+failure. Examples:
+
+- Selfie wording overrides orbit camera.
+- Clothing continuity loses the selected softcore outfit.
+- POV wording makes the off-camera participant the visual subject.
+- Location camera layout inserts foreground anchors in the wrong place.
+
+For one-off model drift, send a cleaner prompt to `sxcp_eval_out` and keep the
+generator unchanged. For repeated prompt behavior, update the generator and add
+the rule to `docs/krea2-prompt-guide.md`.
@@ -0,0 +1,228 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+usage() {
+  cat <<'EOF'
+Usage:
+  tools/sxcp_eval_loop.sh [minutes] [options]
+
+Loop protocol for Krea2 prompt-generator tuning. Start it right after sending a
+prompt to sxcp_eval_out. Every N minutes it writes a structured evaluation
+request, prints it, and optionally pipes it to a command. Each cycle should
+produce either a prompt-only A/B edit, a generator fix, or a prompt-guide rule.
+
+Options:
+  -m, --minutes N       Wait N minutes between evaluation requests.
+  -i, --in CHANNEL      Graph-to-agent channel. Default: sxcp_eval_in.
+  -o, --out CHANNEL     Agent-to-graph prompt-only channel. Default: sxcp_eval_out.
+  -l, --log CHANNEL     Analysis/log channel name. Default: sxcp_eval_log.
+  -g, --guide FILE      Durable Krea2 prompt guide. Default: docs/krea2-prompt-guide.md.
+  -d, --dir DIR         Runtime log directory. Default: .sxcp_eval.
+  --once                Run one wait/check cycle and exit.
+  -h, --help            Show this help.
+
+Optional automation:
+  SXCP_EVAL_CODEX_CMD   If set, the request is piped to this command.
+                        Example: SXCP_EVAL_CODEX_CMD="codex exec"
+
+The command receives the request on stdin and these environment variables:
+  SXCP_EVAL_IN_CHANNEL, SXCP_EVAL_OUT_CHANNEL, SXCP_EVAL_LOG_CHANNEL,
+  SXCP_EVAL_GUIDE_FILE, SXCP_EVAL_REQUEST_FILE, SXCP_EVAL_CYCLE_DIR,
+  SXCP_EVAL_CYCLE.
+EOF
+}
+
+die() {
+  echo "sxcp_eval_loop: $*" >&2
+  exit 1
+}
+
+is_positive_number() {
+  case "${1:-}" in
+    ''|*[!0-9.]*|.*.*|0|0.0|0.00) return 1 ;;
+    *) return 0 ;;
+  esac
+}
+
+minutes="${SXCP_EVAL_MINUTES:-}"
+in_channel="${SXCP_EVAL_IN_CHANNEL:-sxcp_eval_in}"
+out_channel="${SXCP_EVAL_OUT_CHANNEL:-sxcp_eval_out}"
+log_channel="${SXCP_EVAL_LOG_CHANNEL:-sxcp_eval_log}"
+guide_file="${SXCP_EVAL_GUIDE_FILE:-docs/krea2-prompt-guide.md}"
+log_root="${SXCP_EVAL_LOG_DIR:-.sxcp_eval}"
+run_once=0
+
+if [ "${1:-}" != "" ] && [ "${1#-}" = "$1" ]; then
+  minutes="$1"
+  shift
+fi
+
+while [ "$#" -gt 0 ]; do
+  case "$1" in
+    -m|--minutes)
+      [ "$#" -ge 2 ] || die "$1 requires a value"
+      minutes="$2"
+      shift 2
+      ;;
+    -i|--in)
+      [ "$#" -ge 2 ] || die "$1 requires a value"
+      in_channel="$2"
+      shift 2
+      ;;
+    -o|--out)
+      [ "$#" -ge 2 ] || die "$1 requires a value"
+      out_channel="$2"
+      shift 2
+      ;;
+    -l|--log)
+      [ "$#" -ge 2 ] || die "$1 requires a value"
+      log_channel="$2"
+      shift 2
+      ;;
+    -g|--guide)
+      [ "$#" -ge 2 ] || die "$1 requires a value"
+      guide_file="$2"
+      shift 2
+      ;;
+    -d|--dir)
+      [ "$#" -ge 2 ] || die "$1 requires a value"
+      log_root="$2"
+      shift 2
+      ;;
+    --once)
+      run_once=1
+      shift
+      ;;
+    -h|--help)
+      usage
+      exit 0
+      ;;
+    *)
+      die "unknown argument: $1"
+      ;;
+  esac
+done
+
+minutes="${minutes:-5}"
+is_positive_number "$minutes" || die "minutes must be a positive number"
+
+mkdir -p "$log_root"
+run_id="$(date -u +%Y%m%dT%H%M%SZ)"
+run_dir="$log_root/$run_id"
+mkdir -p "$run_dir"
+events_file="$run_dir/events.tsv"
+summary_file="$run_dir/summary.md"
+
+cat > "$summary_file" <<EOF
+# SxCP Eval Loop $run_id
+
+- Interval: ${minutes} minute(s)
+- Input channel: \`$in_channel\`
+- Prompt output channel: \`$out_channel\`
+- Log channel: \`$log_channel\`
+- Krea2 prompt guide: \`$guide_file\`
+
+## Goal
+
+Tune the SxCP generator so its default Krea2 prompts produce the strongest
+possible images for the selected scene, camera, subject, outfit, action, and
+style. Every cycle should turn visual evidence into one of:
+
+- a prompt-only A/B edit,
+- a durable rule for \`$guide_file\`,
+- a generator code/data change with focused test coverage.
+
+## Protocol
+
+1. Pull the latest prompt/image from \`$in_channel\`.
+2. Compare the image against the prompt and previous edited prompt.
+3. Identify concrete Krea2 mismatches and likely generator path.
+4. Classify the next step: prompt-only edit, guide rule, or generator patch.
+5. Push only the next test prompt to \`$out_channel\`.
+6. Keep analysis in chat or \`$log_channel\`, not in \`$out_channel\`.
+7. Edit generator code/data only when the issue is systemic.
+8. Update \`$guide_file\` when a wording rule is confirmed.
+9. Run focused smoke tests after generator edits.
+
+## Cycles
+
+EOF
+
+printf 'cycle\tutc_time\trequest_file\tstatus\n' > "$events_file"
+
+cycle=0
+while :; do
+  cycle=$((cycle + 1))
+  echo "sxcp_eval_loop: cycle $cycle waiting ${minutes} minute(s) before requesting evaluation..."
+  sleep "${minutes}m"
+
+  stamp="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+  cycle_dir="$run_dir/cycle_$(printf '%03d' "$cycle")"
+  mkdir -p "$cycle_dir"
+  request_file="$cycle_dir/request.md"
+
+  cat > "$request_file" <<EOF
+Please run SxCP eval cycle $cycle now.
+
+Primary goal:
+- Tune the generator for better Krea2 images, not just one isolated image.
+- Maintain/update the durable Krea2 prompt guide at: $guide_file
+
+Channels:
+- Pull latest graph output from: $in_channel
+- Push prompt-only replacement to: $out_channel
+- Put analysis/log text in chat or: $log_channel
+
+Evaluation steps:
+1. Pull the latest payload from $in_channel.
+2. Inspect image_path and compare it to the prompt text.
+3. Score these Krea2 axes: identity, outfit continuity, pose/action, camera compliance, location coherence, crop/framing, prompt noise, model confusion tokens, and overall image usefulness.
+4. Identify the smallest concrete mismatch that should be tested next.
+5. Classify the finding:
+   - prompt-only: push exactly one edited prompt to $out_channel and nothing else on that channel.
+   - guide-rule: update $guide_file with the confirmed Krea2 wording rule.
+   - generator-fix: edit the responsible generator path, add/adjust focused smoke coverage, run tests, and summarize the change.
+6. Keep a clear link between the image evidence, the prompt wording, and the generator path.
+7. Append the finding to the eval log with: original issue, changed wording/path, expected improvement, test result, guide update, generator update, and next hypothesis.
+
+Current run:
+- run_id: $run_id
+- cycle: $cycle
+- generated_at_utc: $stamp
+- request_file: $request_file
+- guide_file: $guide_file
+EOF
+
+  {
+    echo
+    echo "### Cycle $cycle - $stamp"
+    echo
+    echo "- Request: \`$request_file\`"
+    echo "- Status: pending evaluation"
+  } >> "$summary_file"
+  printf '%s\t%s\t%s\t%s\n' "$cycle" "$stamp" "$request_file" "pending" >> "$events_file"
+
+  echo
+  echo "================ SxCP Eval Request ================"
+  cat "$request_file"
+  echo "==================================================="
+  echo
+
+  if [ "${SXCP_EVAL_CODEX_CMD:-}" != "" ]; then
+    echo "sxcp_eval_loop: piping request to SXCP_EVAL_CODEX_CMD"
+    SXCP_EVAL_IN_CHANNEL="$in_channel" \
+    SXCP_EVAL_OUT_CHANNEL="$out_channel" \
+    SXCP_EVAL_LOG_CHANNEL="$log_channel" \
+    SXCP_EVAL_GUIDE_FILE="$guide_file" \
+    SXCP_EVAL_REQUEST_FILE="$request_file" \
+    SXCP_EVAL_CYCLE_DIR="$cycle_dir" \
+    SXCP_EVAL_CYCLE="$cycle" \
+      sh -c "$SXCP_EVAL_CODEX_CMD" < "$request_file"
+  fi
+
+  if [ "$run_once" -eq 1 ]; then
+    break
+  fi
+done
+
+echo "sxcp_eval_loop: log written to $run_dir"