# Prompt Architecture Improvement Plan This is a working research note for organizing the prompt builder around the routing map in `docs/prompt-pool-routing-map.md`. ## Current Branch Additions The current branch adds two major surfaces: - `SxCP Krea2 Resolution Selector` in `__init__.py`, with README notes. - Expanded hardcore interaction/manual/action pools in `categories/sexual_poses.json`, `categories/expression_composition_pools.json`, `prompt_builder.py`, and `krea_formatter.py`. The map audit currently sees: - 15 sexual pose subcategories. - 94 sexual pose item templates. - 23 expression pools. - 24 composition pools. - A new Krea2 resolution node with width/height/API aspect outputs. ## Architectural Finding The project has a good functional map, but ownership is still mixed inside large files: - `prompt_builder.py` owns selection, character resolution, role graph logic, camera adaptation, pair assembly, and some final string cleanup. - `krea_formatter.py` owns metadata parsing, cast naturalization, sexual action rewriting, POV rewriting, clothing cleanup, camera preservation, fallback parsing, and final prose assembly. - `sdxl_formatter.py` owns tag assembly and style/quality presets. - `caption_naturalizer.py` owns training-caption prose. - Category JSON files own scalable pool content, but Python still owns several compatibility and role-graph decisions. The biggest maintainability risk is not the number of pools. The risk is that selection, semantic rewriting, and final text hygiene are too interleaved. When a prompt has wrong text, it is easy to patch the wrong layer. ## First Refactor Boundary Generic text hygiene now has one home: - `prompt_hygiene.py` It should only handle route-agnostic cleanup: - whitespace and punctuation normalization; - empty field-label removal; - repeated trigger prefix cleanup; - duplicate comma-list item removal; - adjacent duplicate sentence cleanup; - simple dangling connector cleanup. It must not make semantic decisions such as sexual action positioning, POV geometry, clothing state, or model-specific tag weighting. Those stay in the route-specific owner. Current integration points: - `prompt_builder.build_prompt` - `prompt_builder.build_insta_of_pair` - `krea_formatter.format_krea2_prompt` - `sdxl_formatter.format_sdxl_prompt` - `caption_naturalizer.naturalize_caption` ## Target Organization ### Generation Layer Owner: `prompt_builder.py` plus `categories/*.json`. Keep here: - category/subcategory/item selection; - seed axis routing; - character slot/profile resolution; - scene/expression/composition pool selection; - role graph creation from structured category axes; - metadata row construction. Move or isolate later: - role graph generation for hardcore interaction categories into a dedicated module, for example `hardcore_role_graphs.py`; - camera-scene adapters into `scene_camera_adapters.py`; - category-library loading and inheritance helpers into `category_library.py`. ### Pair / Adapter Layer Owner today: `build_insta_of_pair`. Keep here: - soft/hard row creation; - continuity policy; - softcore cast policy; - pair-level camera routing; - pair metadata shape. Improve later: - make a single pair metadata sanitizer that normalizes `softcore_row`, `hardcore_row`, pair prompts, negatives, captions, and camera fields; - split pair assembly into small functions by phase: `build_soft_row`, `build_hard_row`, `resolve_pair_camera`, `resolve_pair_clothing`, `assemble_pair_metadata`. ### Krea2 Formatter Path Owner: `krea_formatter.py`. Keep here: - Krea prose style; - cast prose; - hardcore action sentence rewriting; - POV sentence rewriting; - clothing naturalization; - camera-scene preservation; - fallback text parsing. Improve later: - split semantic blocks into modules: `krea_cast.py`, `krea_actions.py`, `krea_pov.py`, `krea_clothing.py`; - add route-level smoke fixtures for representative metadata rows; - make `_hardcore_action_sentence` dispatch by action family instead of long conditional chains. ### SDXL Formatter Path Owner: `sdxl_formatter.py`. Keep here: - trigger behavior; - style and quality presets; - tag ordering; - weighted explicit tags; - negative-prompt assembly. Improve later: - move presets into data dictionaries or JSON so adding styles does not require editing formatter logic; - add formatter profiles for Pony, SDXL photo, and flat vector; - make fallback cleanup use the shared field-label inventory. ### Naturalizer Path Owner: `caption_naturalizer.py`. Keep here: - natural sentence caption assembly; - training-caption trigger behavior; - style-tail policy. Improve later: - share more metadata readers with Krea without sharing Krea prose; - add a `caption_profile` option for concise/dense LoRA caption styles. ### Category JSON Path Owner: `categories/*.json`. Keep here: - scalable prompt pool content; - named scene/expression/composition pools; - item templates and axes; - direct category-specific wording. Improve later: - introduce optional `family` and `action_type` fields on item templates so Python filters do less keyword guessing; - add `formatter_hint` fields only where needed, not globally; - add a JSON audit that checks every referenced expression/composition/scene pool exists. ### Node / UI Path Owner: `__init__.py`, `loop_nodes.py`, `web/*.js`. Keep here: - ComfyUI node input/output declarations; - widget behavior; - button actions; - dynamic input slots. Improve later: - split large node classes into files by family; - keep node display names, return names, and docs in sync through the audit helper; - add small endpoint tests for profile/accumulator/index-switch routes. ## Path-Specific Improvements ### Prompt Builder Near-term: - Add final row hygiene already done through `prompt_hygiene.py`. - Add a metadata smoke checker for representative rows through `tools/prompt_smoke.py`. - Normalize every row with one function before JSON serialization. Medium-term: - Extract category loading and role graph logic. - Convert keyword-heavy interaction filtering to template metadata. ### Insta/OF Pair Near-term: - Normalize pair metadata with one helper. - Confirm pair prompts, captions, and soft/hard rows carry the same sanitized scene/camera/clothing fields. - Keep same-room pair continuity synchronized in both assembled prompt text and `hardcore_row.scene_text`; `tools/prompt_smoke.py` covers this drift case. Medium-term: - Make pair camera and clothing phases explicit subfunctions. - Add smoke fixtures for same-cast, POV man, explicit nude, and different-camera modes. ### Krea2 Near-term: - Add final prose hygiene already done through `prompt_hygiene.py`. - Add smoke coverage through `tools/prompt_smoke.py` for metadata-driven Krea2 formatting across built-in rows, hardcore rows, same-cast pairs, and POV pairs. - Cover camera-scene preservation through `tools/prompt_smoke.py` for single rows, split soft/hard pair cameras, and POV camera-scene routing. Expand it next for close foreplay and POV penetration. Medium-term: - Dispatch action rewriting by action family. - Split Krea semantic helpers into smaller modules. ### SDXL Near-term: - Add final tag hygiene already done through `prompt_hygiene.py`. - Add smoke tests for trigger preservation and duplicate tag removal through `tools/prompt_smoke.py`. Medium-term: - Make style/quality presets data-driven. ### Naturalizer Near-term: - Add final prose hygiene already done through `prompt_hygiene.py`. - Verify training captions keep trigger exactly once through `tools/prompt_smoke.py`. Medium-term: - Add caption profiles for training and browsing use cases. ### Camera / Scene Near-term: - Keep Qwen/orbit as camera source. - Keep scene-camera adapters scoped by location family. - Use the memory note in `/home/ethanfel/.codex/memories/scene-camera-system.md` when editing POV. Medium-term: - Move coworking adapter into a scene-camera adapter module. - Build new adapters one location family at a time. ## Invariants To Preserve - Metadata is the preferred formatter input. - Prompt Builder should output structured rows even if raw prompt text is rough. - Krea should fix prose and semantic action readability, not category selection. - SDXL should produce tag-style output and preserve model triggers as requested. - Naturalizer should output training-friendly captions without changing the selected content. - Generic cleanup belongs in `prompt_hygiene.py`; semantic cleanup belongs in the owning route. ## Recommended Next Passes 1. Expand `tools/prompt_smoke.py` with close foreplay, POV penetration, and location-theme fixtures. 2. Split Krea action/POV/clothing helpers into separate modules. 3. Add category JSON pool reference validation to `tools/prompt_map_audit.py`. 4. Extract scene-camera adapters from `prompt_builder.py`. 5. Split `__init__.py` node classes by family after behavior is covered by smoke checks.