# Prompt Architecture Improvement Plan

This is a working research note for organizing the prompt builder around the
routing map in `docs/prompt-pool-routing-map.md`.

## Current Branch Additions

The current branch adds two major surfaces:

- `SxCP Krea2 Resolution Selector` in `__init__.py`, with README notes.
- Expanded hardcore interaction/manual/action pools in
  `categories/sexual_poses.json`,
  `categories/expression_composition_pools.json`, `prompt_builder.py`, and
  `krea_formatter.py`.

The map audit currently sees:

- 15 sexual pose subcategories.
- 94 sexual pose item templates.
- 23 expression pools.
- 24 composition pools.
- A new Krea2 resolution node with width/height/API aspect outputs.

## Architectural Finding

The project has a good functional map, but ownership is still mixed inside large
files:

- `prompt_builder.py` owns selection, character resolution, role graph logic,
  camera adaptation, pair assembly, and some final string cleanup.
- `krea_formatter.py` owns metadata parsing, cast naturalization, sexual action
  rewriting, POV rewriting, clothing cleanup, camera preservation, fallback
  parsing, and final prose assembly.
- `sdxl_formatter.py` owns tag assembly and style/quality presets.
- `caption_naturalizer.py` owns training-caption prose.
- Category JSON files own scalable pool content, but Python still owns several
  compatibility and role-graph decisions.

The biggest maintainability risk is not the number of pools. The risk is that
selection, semantic rewriting, and final text hygiene are too interleaved. When a
prompt has wrong text, it is easy to patch the wrong layer.

## First Refactor Boundary

Generic text hygiene now has one home:

- `prompt_hygiene.py`

It should only handle route-agnostic cleanup:

- whitespace and punctuation normalization;
- empty field-label removal;
- repeated trigger prefix cleanup;
- duplicate comma-list item removal;
- adjacent duplicate sentence cleanup;
- simple dangling connector cleanup.

It must not make semantic decisions such as sexual action positioning, POV
geometry, clothing state, or model-specific tag weighting. Those stay in the
route-specific owner.

Current integration points:

- `prompt_builder.build_prompt`
- `prompt_builder.build_insta_of_pair`
- `krea_formatter.format_krea2_prompt`
- `sdxl_formatter.format_sdxl_prompt`
- `caption_naturalizer.naturalize_caption`

## Target Organization

### Generation Layer

Owner: `prompt_builder.py` plus `categories/*.json`.

Keep here:

- category/subcategory/item selection;
- seed axis routing;
- character slot/profile resolution;
- scene/expression/composition pool selection;
- role graph creation from structured category axes;
- metadata row construction.

Move or isolate later:

- role graph generation for hardcore interaction categories into a dedicated
  module, for example `hardcore_role_graphs.py`;
- camera-scene adapters into `scene_camera_adapters.py`;
- category-library loading and inheritance helpers into `category_library.py`.

### Pair / Adapter Layer

Owner today: `build_insta_of_pair`.

Keep here:

- soft/hard row creation;
- continuity policy;
- softcore cast policy;
- pair-level camera routing;
- pair metadata shape.

Improve later:

- make a single pair metadata sanitizer that normalizes `softcore_row`,
  `hardcore_row`, pair prompts, negatives, captions, and camera fields;
- split pair assembly into small functions by phase:
  `build_soft_row`, `build_hard_row`, `resolve_pair_camera`,
  `resolve_pair_clothing`, `assemble_pair_metadata`.

### Krea2 Formatter Path

Owner: `krea_formatter.py`.

Keep here:

- Krea prose style;
- cast prose;
- hardcore action sentence rewriting;
- POV sentence rewriting;
- clothing naturalization;
- camera-scene preservation;
- fallback text parsing.

Improve later:

- split semantic blocks into modules:
  `krea_cast.py`, `krea_actions.py`, `krea_pov.py`, `krea_clothing.py`;
- add route-level smoke fixtures for representative metadata rows;
- make `_hardcore_action_sentence` dispatch by action family instead of long
  conditional chains.

### SDXL Formatter Path

Owner: `sdxl_formatter.py`.

Keep here:

- trigger behavior;
- style and quality presets;
- tag ordering;
- weighted explicit tags;
- negative-prompt assembly.

Improve later:

- move presets into data dictionaries or JSON so adding styles does not require
  editing formatter logic;
- add formatter profiles for Pony, SDXL photo, and flat vector;
- make fallback cleanup use the shared field-label inventory.

### Naturalizer Path

Owner: `caption_naturalizer.py`.

Keep here:

- natural sentence caption assembly;
- training-caption trigger behavior;
- style-tail policy.

Improve later:

- share more metadata readers with Krea without sharing Krea prose;
- add a `caption_profile` option for concise/dense LoRA caption styles.

### Category JSON Path

Owner: `categories/*.json`.

Keep here:

- scalable prompt pool content;
- named scene/expression/composition pools;
- item templates and axes;
- direct category-specific wording.

Improve later:

- introduce optional `family` and `action_type` fields on item templates so
  Python filters do less keyword guessing;
- add `formatter_hint` fields only where needed, not globally;
- add a JSON audit that checks every referenced expression/composition/scene pool
  exists.

### Node / UI Path

Owner: `__init__.py`, `loop_nodes.py`, `web/*.js`.

Keep here:

- ComfyUI node input/output declarations;
- widget behavior;
- button actions;
- dynamic input slots.

Improve later:

- split large node classes into files by family;
- keep node display names, return names, and docs in sync through the audit
  helper;
- add small endpoint tests for profile/accumulator/index-switch routes.

## Path-Specific Improvements

### Prompt Builder

Near-term:

- Add final row hygiene already done through `prompt_hygiene.py`.
- Add a metadata smoke checker for representative rows through
  `tools/prompt_smoke.py`.
- Normalize every row with one function before JSON serialization.

Medium-term:

- Extract category loading and role graph logic.
- Convert keyword-heavy interaction filtering to template metadata.

### Insta/OF Pair

Near-term:

- Normalize pair metadata with one helper.
- Confirm pair prompts, captions, and soft/hard rows carry the same sanitized
  scene/camera/clothing fields.
- Keep same-room pair continuity synchronized in both assembled prompt text and
  `hardcore_row.scene_text`; `tools/prompt_smoke.py` covers this drift case.

Medium-term:

- Make pair camera and clothing phases explicit subfunctions.
- Add smoke fixtures for same-cast, POV man, explicit nude, and different-camera
  modes.

### Krea2

Near-term:

- Add final prose hygiene already done through `prompt_hygiene.py`.
- Add smoke coverage through `tools/prompt_smoke.py` for metadata-driven Krea2
  formatting across built-in rows, hardcore rows, same-cast pairs, and POV
  pairs.
- Cover camera-scene preservation through `tools/prompt_smoke.py` for single
  rows, split soft/hard pair cameras, and POV camera-scene routing. Expand it
  next for close foreplay and POV penetration.

Medium-term:

- Dispatch action rewriting by action family.
- Split Krea semantic helpers into smaller modules.

### SDXL

Near-term:

- Add final tag hygiene already done through `prompt_hygiene.py`.
- Add smoke tests for trigger preservation and duplicate tag removal through
  `tools/prompt_smoke.py`.

Medium-term:

- Make style/quality presets data-driven.

### Naturalizer

Near-term:

- Add final prose hygiene already done through `prompt_hygiene.py`.
- Verify training captions keep trigger exactly once through
  `tools/prompt_smoke.py`.

Medium-term:

- Add caption profiles for training and browsing use cases.

### Camera / Scene

Near-term:

- Keep Qwen/orbit as camera source.
- Keep scene-camera adapters scoped by location family.
- Use the memory note in
  `/home/ethanfel/.codex/memories/scene-camera-system.md` when editing POV.

Medium-term:

- Move coworking adapter into a scene-camera adapter module.
- Build new adapters one location family at a time.

## Invariants To Preserve

- Metadata is the preferred formatter input.
- Prompt Builder should output structured rows even if raw prompt text is rough.
- Krea should fix prose and semantic action readability, not category selection.
- SDXL should produce tag-style output and preserve model triggers as requested.
- Naturalizer should output training-friendly captions without changing the
  selected content.
- Generic cleanup belongs in `prompt_hygiene.py`; semantic cleanup belongs in
  the owning route.

## Recommended Next Passes

1. Expand `tools/prompt_smoke.py` with close foreplay, POV penetration, and
   location-theme fixtures.
2. Split Krea action/POV/clothing helpers into separate modules.
3. Add category JSON pool reference validation to `tools/prompt_map_audit.py`.
4. Extract scene-camera adapters from `prompt_builder.py`.
5. Split `__init__.py` node classes by family after behavior is covered by smoke
   checks.