Add Krea2 POV routing and eval tooling
This commit is contained in:
+117
-13
@@ -5,6 +5,9 @@ ComfyUI sends a generated prompt, image, and seed to Codex, Codex analyzes the
|
||||
result, then sends back exactly one edited prompt for the next A/B test.
|
||||
Confirmed findings become either generator changes or durable prompt rules in
|
||||
[`krea2-prompt-guide.md`](krea2-prompt-guide.md).
|
||||
The active A/B testing method is recorded in
|
||||
[`krea2-ab-methodology.md`](krea2-ab-methodology.md); update that memory when
|
||||
the method improves.
|
||||
|
||||
## Channels
|
||||
|
||||
@@ -14,6 +17,76 @@ Confirmed findings become either generator changes or durable prompt rules in
|
||||
the MCP signal when supported. Do not put analysis here.
|
||||
- `sxcp_eval_log`: optional analysis/log channel.
|
||||
|
||||
## MCP Helper Command
|
||||
|
||||
Use the checked helper for bridge calls instead of ad hoc Python snippets. The
|
||||
approved command prefix is:
|
||||
|
||||
```bash
|
||||
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py
|
||||
```
|
||||
|
||||
Common calls:
|
||||
|
||||
```bash
|
||||
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py list-tools
|
||||
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_pull --arguments-json '{"channel":"sxcp_eval_in"}'
|
||||
/media/p5/miniforge3/bin/python tools/sxcp_mcp_client.py call-tool comfy_push --arguments-json '{"channel":"sxcp_eval_out","seed":5656565656,"text":"PROMPT_ONLY_POSITIVE_CONDITIONING"}'
|
||||
```
|
||||
|
||||
## Batch Prompt Helper
|
||||
|
||||
For prompt-axis batches, prepare a local JSON file and use the offline helper to
|
||||
render the approved MCP push/pull commands and an image-presence checklist:
|
||||
|
||||
```bash
|
||||
python tools/sxcp_prompt_batch.py validate --batch-json /tmp/sxcp-batch.json
|
||||
python tools/sxcp_prompt_batch.py print-push-commands --batch-json /tmp/sxcp-batch.json
|
||||
python tools/sxcp_prompt_batch.py print-result-template --batch-json /tmp/sxcp-batch.json
|
||||
python tools/sxcp_prompt_batch.py run-batch --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --previous-turn 80 --run
|
||||
python tools/sxcp_prompt_batch.py validate-results --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json
|
||||
python tools/sxcp_prompt_batch.py print-eval-entry-draft --batch-json /tmp/sxcp-batch.json --result-json /tmp/sxcp-results.json --variant-key pov_example_variant --baseline-image /absolute/baseline.png --candidate-id controlled_subject_first
|
||||
```
|
||||
|
||||
Batch files use the fixed sampler seed and one positive prompt per probe:
|
||||
|
||||
```json
|
||||
{
|
||||
"seed": 8989898989,
|
||||
"channel_out": "sxcp_eval_out",
|
||||
"channel_in": "sxcp_eval_in",
|
||||
"probes": [
|
||||
{
|
||||
"id": "controlled_subject_first",
|
||||
"prompt_order": "subject_first",
|
||||
"text": "SUBJECT_LOOK_FIRST. POSE_HIERARCHY. LOCATION_ANCHORS."
|
||||
},
|
||||
{
|
||||
"id": "rough_geometry_axis",
|
||||
"prompt_order": "geometry_only",
|
||||
"text": "POSE_AXIS_ONLY_FOR_DISCOVERY."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
`geometry_only` probes are for rough pose-axis discovery and are not durable
|
||||
subject/look-controlled A/B evidence. The helper rejects
|
||||
`sxcp_eval_negative_out`; keep batch prompts positive-only.
|
||||
|
||||
Use `run-batch --run` to push one positive prompt, poll `sxcp_eval_in` until a
|
||||
new turn and absolute PNG image path appear with the fixed sampler seed, write
|
||||
the filled result JSON, then send the next probe. Omit `--run` for a dry-run
|
||||
command preview. After a live run, run `validate-results`; it requires the
|
||||
result probe ids to match the batch order, each turn to advance in batch order,
|
||||
every image path to be an absolute PNG artifact, and every returned seed to
|
||||
match the fixed sampler seed. Then use `print-eval-entry-draft` to create a
|
||||
valid `krea2-eval-log.json` entry draft. Replace the generated summaries and
|
||||
observation with the real visual comparison before recording it with
|
||||
`tools/krea2_record_eval.py`. By default the draft command rejects
|
||||
`geometry_only` candidates; pass `--allow-geometry-only` only when deliberately
|
||||
recording non-controlled prompt-axis evidence.
|
||||
|
||||
## Manual Loop
|
||||
|
||||
Start the helper after sending a test prompt:
|
||||
@@ -30,23 +103,30 @@ Every three minutes it prints a structured request asking Codex to:
|
||||
4. Compare it to the prompt and previous edit.
|
||||
5. Push one prompt-only edit to `sxcp_eval_out`, preserving the same seed through
|
||||
the MCP signal when available.
|
||||
6. Classify the finding as prompt-only, prompt-guide rule, or generator fix.
|
||||
7. Change generator code/data only when the issue is systemic.
|
||||
8. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
|
||||
6. Classify the finding as prompt-only, prompt-guide rule, provisional generator
|
||||
improvement, or proven generator fix.
|
||||
7. When leaving a category after same-seed progress over baseline, mirror the
|
||||
best generator-safe wording into the responsible generator path as
|
||||
`provisional_generator_patch`.
|
||||
8. Promote a generator change to proven only when the issue is systemic,
|
||||
repeated, or structurally wrong before rendering.
|
||||
9. Record the finding and update the Krea2 prompt guide when a rule is confirmed.
|
||||
|
||||
Runtime logs are written under `.sxcp_eval/` and ignored by git.
|
||||
|
||||
Durable fixed-seed findings that justify a guide rule, generator patch, or pose
|
||||
variant promotion are recorded in [`krea2-eval-log.json`](krea2-eval-log.json).
|
||||
Method changes belong in [`krea2-ab-methodology.md`](krea2-ab-methodology.md).
|
||||
Use runtime logs for scratch notes; use the JSON log only for evidence that
|
||||
should remain tied to a catalog variant. Image paths in that log point at
|
||||
external ComfyUI artifacts and may be cleaned; the durable evidence is the fixed
|
||||
seed, prompt summaries, observation, decision, and commit.
|
||||
sampler seed, optional generator seed, prompt summaries, observation, decision,
|
||||
and commit.
|
||||
|
||||
Record durable findings with the checked helper instead of hand-editing the log:
|
||||
|
||||
```bash
|
||||
python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 > /tmp/krea2-entry.json
|
||||
python tools/krea2_record_eval.py --print-template --variant-key pov_footjob_frontal_sole_stroke --seed 1234 --generator-seed 5678 > /tmp/krea2-entry.json
|
||||
python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json --dry-run
|
||||
python tools/krea2_record_eval.py --entry-json /tmp/krea2-entry.json
|
||||
```
|
||||
@@ -59,6 +139,7 @@ Entry template:
|
||||
"date": "2026-06-29",
|
||||
"variant_key": "pov_example_variant",
|
||||
"seed": 1234,
|
||||
"generator_seed": 5678,
|
||||
"source": "sxcp_eval_mcp",
|
||||
"result": "accepted",
|
||||
"decision": "generator_patch",
|
||||
@@ -141,22 +222,45 @@ pelvis can be correct. Do not treat them as automatic failures.
|
||||
|
||||
## Seed Contract
|
||||
|
||||
The seed is transport metadata, not prompt text. When the graph emits a seed, an
|
||||
A/B wording test should reuse that exact seed so the image difference mostly
|
||||
comes from wording, not sampling randomness. If a payload has no seed, mark that
|
||||
The sampler seed is transport metadata, not prompt text. When the graph emits a
|
||||
sampler seed, an A/B wording test should reuse that exact seed so the image
|
||||
difference mostly comes from wording, not sampling randomness. If the SxCP
|
||||
generator/control seed differs from the sampler seed, record it as
|
||||
`generator_seed` in the eval entry. If a payload has no sampler seed, mark that
|
||||
cycle as uncontrolled and avoid turning the result into a durable generator rule
|
||||
without another controlled run.
|
||||
|
||||
## Positive-Only Conditioning
|
||||
|
||||
`sxcp_eval_out` is positive conditioning only. Never send negative-conditioning
|
||||
phrases such as `no shaft`, `no hands`, `without clothing`, or `avoid X` inside
|
||||
the positive prompt; distilled Krea2 can reinforce or hallucinate the unwanted
|
||||
object from that wording.
|
||||
|
||||
This loop has no active negative-output channel. A same-positive, same-seed
|
||||
probe on seed `424242` compared empty negative conditioning against strong
|
||||
negative text targeting visible prompt attributes, and the rendered image stayed
|
||||
visually unchanged. Do not rely on negative conditioning for Krea2 pose tuning;
|
||||
keep prompt fixes positive-only.
|
||||
|
||||
## Generator Fix Rule
|
||||
|
||||
Only edit the generator when the image shows a repeatable, systemic prompt
|
||||
failure. Examples:
|
||||
Use two levels of generator change:
|
||||
|
||||
- `provisional_generator_patch`: apply the best generator-safe wording when
|
||||
leaving a category after fixed-seed progress over baseline. Keep the catalog
|
||||
variant as `candidate`.
|
||||
- `generator_patch`: promote as a proven/default generator rule when the issue
|
||||
is repeated, systemic, or structurally wrong before rendering.
|
||||
|
||||
Examples of proven generator fixes:
|
||||
|
||||
- Selfie wording overrides orbit camera.
|
||||
- Clothing continuity loses the selected softcore outfit.
|
||||
- POV wording makes the off-camera participant the visual subject.
|
||||
- Location camera layout inserts foreground anchors in the wrong place.
|
||||
|
||||
For one-off model drift, send a cleaner prompt to `sxcp_eval_out` and keep the
|
||||
generator unchanged. For repeated prompt behavior, update the generator and add
|
||||
the rule to `docs/krea2-prompt-guide.md`.
|
||||
For one-off model drift inside an active category, send a cleaner prompt to
|
||||
`sxcp_eval_out` and keep collecting evidence. When exiting a category, carry
|
||||
forward same-seed improvements over baseline as provisional generator changes
|
||||
and add the rule or weak case to `docs/krea2-prompt-guide.md`.
|
||||
|
||||
Reference in New Issue
Block a user