ComfyUI-Prompt-Calibrator

8 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
Ethanfel	06992506d7	Drop named-position axis for grounded geometry (30B still mis-names positions) Even the 30B mis-identifies named sex positions (doggy/cowgirl) from images, so position_name is removed. The pose cluster is now purely observable geometry: body_orientation enriched with facing direction (who faces whom), plus limb_arrangement / contact_points / pose. The agent composes any named label from these reliable primitives. 23 default axes. Docs/examples updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:49:23 +02:00
Ethanfel	e4dfaac63b	Correct 4B 'partial' bias on identical values; harden verdict rule; note model-capability limits The 4B over-uses 'partial' (mislabels identical ref/gen and clear opposites) and also mis-identifies fine-grained content (e.g. names a position 'doggy'/'cowgirl' when it is neither). Deterministic fix: force verdict=match when normalized ref==gen. Prompt hardened to not default to 'partial' (opposites=mismatch). Docs: the 4B is only reliable for coarse attributes — use the 30B for fine-grained recognition; prefer grounded geometry axes over named-position labels. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:43:34 +02:00
Ethanfel	69c1d6deb4	describe emits one canonical reference; compare can anchor on it Describe mode now produces a single coherent, internally-consistent canonical scene description (paragraph + per-axis spec, written to canonical_reference in the report). Compare gains an optional reference_description input: when set, it anchors on that fixed text and shows only the generated image (no swap) — so the reference side never drifts or self-contradicts across iterations; only the generated image is re-described each turn. agent_bridge gains --ref-desc / --ref-desc-file (reads the describe report's canonical_reference). Docs + example workflow updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:22:57 +02:00
Ethanfel	53f1f9b9b4	Switch compare to discrete verdicts + granular pose axes + per-axis definitions The 4B's 0-1 scores were unreliable (identical ref/gen scored ~0.6), so the judge now returns verdict match/partial/mismatch per axis; overall_score and a new mismatch_count are computed from verdicts on our side (reliable, monotonic). Expanded the action/pose cluster into position_name, body_orientation, limb_arrangement, penetration, contact_points, genital_visibility (+ breast_size) so explicit poses carry detail. Each axis now ships a one-line definition in the prompt so gender_mix/subject_count stop absorbing positional text. 24 axes total. Example workflows use the node default (axes=''). Docs realigned; stop condition is now mismatch_count==0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:15:51 +02:00
Ethanfel	c7ef756a71	Add describe (first-pass) mode to the judge node New mode on QwenVLImageJudge: 'describe' looks at the reference alone and returns a prompt-ready caption + per-axis target spec to seed the very first prompt (the generator has nothing to reproduce yet). 'compare' is the existing ref-vs-gen scoring. generated_image is now optional (required only for compare); shared generation refactored into _generate_from_messages; third output renamed diff_analysis -> analysis (mode-agnostic). agent_bridge gains --mode (describe needs no receptor/prompt); added workflow_describe_api.json. Docs updated with the first-pass bootstrap step. Fixed error-return arity to 5-tuple. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:04:09 +02:00
Ethanfel	959ec70065	Redesign judge output for calibration: per-axis {score, ref, gen}, drop local fix suggestions The local VLM now only observes and scores; correction is left to the stronger external agent. Each axis reports the target value (ref), the current value (gen) and the closeness (score) — the target/current/distance an agent needs to calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/ camera/render) so the action cluster stays discriminative for explicit content. swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first; default max_new_tokens 1024. Docs aligned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 22:52:40 +02:00
Ethanfel	aa3983d94a	Fix: handle missing processor chat template ComfyUI-converted checkpoints ship the template as chat_template.jinja (not on the processor), so apply_chat_template raised 'this processor does not have a chat template'. Backfill processor.chat_template from chat_template.jinja/.json or the tokenizer at load time, and fall back to a hand-built Qwen-VL ChatML prompt if none exists. Also keep *.jinja in the auto-download patterns. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 22:36:39 +02:00
Ethanfel	95198a15b5	Initial commit: VLM-as-judge prompt calibration loop Qwen3-VL image-similarity judge node, external-prompt receptor node, agent_bridge CLI, example SDXL workflow, and methodology/agent-loop/ calibration-policy docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 22:15:56 +02:00