Add Krea2 POV routing and eval tooling

This commit is contained in:
2026-06-30 19:28:10 +02:00
parent 284c6279e6
commit f5ba07e340
29 changed files with 6331 additions and 400 deletions
+117 -13
View File
@@ -5,6 +5,9 @@ ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
result, then sends back exactly one edited prompt for the next A/B test.
Confirmed findings become either generator changes or durable prompt rules in
[`krea2-prompt-guide.md`](krea2-prompt-guide.md).
The active A/B testing method is recorded in
[`krea2-ab-methodology.md`](krea2-ab-methodology.md); update that memory when
the method improves.
## Channels
@@ -14,6 +17,76 @@ Confirmed findings become either generator changes or durable prompt rules in
the MCP signal when supported. Do not put analysis here.
- `sxcp_eval_log`: optional analysis/log channel.
## MCP Helper Command
Use the checked helper for bridge calls instead of ad hoc Python snippets. The
approved command prefix is:
```bash
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py
```
Common calls:
```bash
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py list-tools
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_pull --arguments-json '{"channel":"sxcp_eval_in"}'
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_push --arguments-json '{"channel":"sxcp_eval_out","seed":5656565656,"text":"PROMPT_ONLY_POSITIVE_CONDITIONING"}'
```
## Batch Prompt Helper
For prompt-axis batches, prepare a local JSON file and use the offline helper to
render the approved MCP push/pull commands and an image-presence checklist:
```bash
python tools/sxcp_prompt_batch.py validate --batch-json /tmp/sxcp-batch.json
python tools/sxcp_prompt_batch.py print-push-commands --batch-json /tmp/sxcp-batch.json
python tools/sxcp_prompt_batch.py print-result-template --batch-json /tmp/sxcp-batch.json
python tools/sxcp_prompt_batch.py run-batch --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --previous-turn 80 --run
python tools/sxcp_prompt_batch.py validate-results --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json
python tools/sxcp_prompt_batch.py print-eval-entry-draft --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --variant-key pov_example_variant --baseline-image /absolute/baseline.png --candidate-id controlled_subject_first
```
Batch files use the fixed sampler seed and one positive prompt per probe:
```json
{
"seed": 8989898989,
"channel_out": "sxcp_eval_out",
"channel_in": "sxcp_eval_in",
"probes": [
{
"id": "controlled_subject_first",
"prompt_order": "subject_first",
"text": "SUBJECT_LOOK_FIRST. POSE_HIERARCHY. LOCATION_ANCHORS."
},
{
"id": "rough_geometry_axis",
"prompt_order": "geometry_only",
"text": "POSE_AXIS_ONLY_FOR_DISCOVERY."
}
]
}
```
`geometry_only` probes are for rough pose-axis discovery and are not durable
subject/look-controlled A/B evidence. The helper rejects
`sxcp_eval_negative_out`; keep batch prompts positive-only.
Use `run-batch --run` to push one positive prompt, poll `sxcp_eval_in` until a
new turn and absolute PNG image path appear with the fixed sampler seed, write
the filled result JSON, then send the next probe. Omit `--run` for a dry-run
command preview. After a live run, run `validate-results`; it requires the
result probe ids to match the batch order, each turn to advance in batch order,
every image path to be an absolute PNG artifact, and every returned seed to
match the fixed sampler seed. Then use `print-eval-entry-draft` to create a
valid `krea2-eval-log.json` entry draft. Replace the generated summaries and
observation with the real visual comparison before recording it with
`tools/krea2_record_eval.py`. By default the draft command rejects
`geometry_only` candidates; pass `--allow-geometry-only` only when deliberately
recording non-controlled prompt-axis evidence.
## Manual Loop
Start the helper after sending a test prompt:
@@ -30,23 +103,30 @@ Every three minutes it prints a structured request asking Codex to:
4. Compare it to the prompt and previous edit.
5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
the MCP signal when available.
6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
7. Change generator code/data only when the issue is systemic.
8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
6. Classify the finding as prompt-only, prompt-guide rule, provisional generator
improvement, or proven generator fix.
7. When leaving a category after same-seed progress over baseline, mirror the
best generator-safe wording into the responsible generator path as
`provisional_generator_patch`.
8. Promote a generator change to proven only when the issue is systemic,
repeated, or structurally wrong before rendering.
9. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
Runtime logs are written under `.sxcp_eval/` and ignored by git.
Durable fixed-seed findings that justify a guide rule, generator patch, or pose
variant promotion are recorded in [`krea2-eval-log.json`](krea2-eval-log.json).
Method changes belong in [`krea2-ab-methodology.md`](krea2-ab-methodology.md).
Use runtime logs for scratch notes; use the JSON log only for evidence that
should remain tied to a catalog variant. Image paths in that log point at
external ComfyUI artifacts and may be cleaned; the durable evidence is the fixed
seed, prompt summaries, observation, decision, and commit.
sampler seed, optional generator seed, prompt summaries, observation, decision,
and commit.
Record durable findings with the checked helper instead of hand-editing the log:
```bash
python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 > /tmp/krea2-entry.json
python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 --generator-seed 5678 > /tmp/krea2-entry.json
python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json --dry-run
python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json
```
@@ -59,6 +139,7 @@ Entry template:
"date": "2026-06-29",
"variant_key": "pov_example_variant",
"seed": 1234,
"generator_seed": 5678,
"source": "sxcp_eval_mcp",
"result": "accepted",
"decision": "generator_patch",
@@ -141,22 +222,45 @@ pelvis can be correct. Do not treat them as automatic failures.
## Seed Contract
The seed is transport metadata, not prompt text. When the graph emits a seed, an
A/B wording test should reuse that exact seed so the image difference mostly
comes from wording, not sampling randomness. If a payload has no seed, mark that
The sampler seed is transport metadata, not prompt text. When the graph emits a
sampler seed, an A/B wording test should reuse that exact seed so the image
difference mostly comes from wording, not sampling randomness. If the SxCP
generator/control seed differs from the sampler seed, record it as
`generator_seed` in the eval entry. If a payload has no sampler seed, mark that
cycle as uncontrolled and avoid turning the result into a durable generator rule
without another controlled run.
## Positive-Only Conditioning
`sxcp_eval_out` is positive conditioning only. Never send negative-conditioning
phrases such as `no shaft`, `no hands`, `without clothing`, or `avoid X` inside
the positive prompt; distilled Krea2 can reinforce or hallucinate the unwanted
object from that wording.
This loop has no active negative-output channel. A same-positive, same-seed
probe on seed `424242` compared empty negative conditioning against strong
negative text targeting visible prompt attributes, and the rendered image stayed
visually unchanged. Do not rely on negative conditioning for Krea2 pose tuning;
keep prompt fixes positive-only.
## Generator Fix Rule
Only edit the generator when the image shows a repeatable, systemic prompt
failure. Examples:
Use two levels of generator change:
- `provisional_generator_patch`: apply the best generator-safe wording when
leaving a category after fixed-seed progress over baseline. Keep the catalog
variant as `candidate`.
- `generator_patch`: promote as a proven/default generator rule when the issue
is repeated, systemic, or structurally wrong before rendering.
Examples of proven generator fixes:
- Selfie wording overrides orbit camera.
- Clothing continuity loses the selected softcore outfit.
- POV wording makes the off-camera participant the visual subject.
- Location camera layout inserts foreground anchors in the wrong place.
For one-off model drift, send a cleaner prompt to `sxcp_eval_out` and keep the
generator unchanged. For repeated prompt behavior, update the generator and add
the rule to `docs/krea2-prompt-guide.md`.
For one-off model drift inside an active category, send a cleaner prompt to
`sxcp_eval_out` and keep collecting evidence. When exiting a category, carry
forward same-seed improvements over baseline as provisional generator changes
and add the rule or weak case to `docs/krea2-prompt-guide.md`.