Extract formatter input parsing policy

2026-06-27 01:22:07 +02:00
parent b54b8b9421
commit 4c45d96472
7 changed files with 239 additions and 159 deletions
@@ -62,6 +62,23 @@ route-specific owner. It also preserves ordinary words such as `composition`
 inside normal sentences; empty field-label cleanup is limited to standalone
 labels.

+Formatter input/fallback parsing now has one home:
+
+- `formatter_input.py`
+
+It owns route-neutral parsing shared by Krea2, SDXL, and natural-caption
+routes:
+
+- whitespace and punctuation normalization before formatter parsing;
+- JSON row detection from `metadata_json` or source text;
+- trigger-prefix stripping with route-specific trigger candidate lists;
+- `Avoid:` positive/negative splitting for fallback text;
+- prompt field extraction such as `Setting:` or `Composition:`;
+- row-value fallback from metadata fields to labeled prompt text.
+
+It must not make formatter-style decisions. Krea prose, SDXL tags, and training
+caption sentence shape stay in their formatter modules.
+
 Shared hardcore phrase cleanup now has one home:

 - `hardcore_text_cleanup.py`
@@ -242,6 +259,9 @@ Already isolated:
 - `krea_pov_actions.py` owns POV hardcore action sentence rewriting,
  first-person body geometry, and selected-position-axis priority before loose
  context fallback.
+- `formatter_input.py` owns shared metadata/source JSON detection, trigger
+  stripping, prompt-field extraction, `Avoid:` splitting, and row-value
+  fallback for Krea, SDXL, and caption routes.

 Improve later:

@@ -262,6 +282,7 @@ Keep here:
 - negative-prompt assembly.
 - metadata-family tag hints from `action_family`, `position_family`, and
  `position_keys`.
+- shared formatter input parsing from `formatter_input.py`.

 Improve later:

@@ -280,6 +301,7 @@ Keep here:
 - training-caption trigger behavior;
 - style-tail policy.
 - metadata-family action labels from `action_family` and `position_family`.
+- shared formatter input parsing from `formatter_input.py`.

 Improve later: