ComfyUI-Prompt-Calibrator

Author	SHA1	Message	Date
Ethanfel	22fd24b29e	Re-enable reasoning for accurate verdicts (no-think rubber-stamped 'match') Disabling thinking made reasoning models mark everything 'match' even when ref/gen clearly differ. Added an enable_thinking toggle (default ON) threaded through the generation path; the prompt now allows reasoning then asks for the result, and verdict_rule explicitly warns against lazy 'match'. _parse_json now scans for the JSON object AFTER the reasoning prose (last balanced object with 'axes'), and the markdown fallback already reads reasoned per-axis output. Default max_new_tokens 2048->3072 so verdicts don't get cut off. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 10:56:47 +02:00
Ethanfel	fee136e98c	Expose long-text fields as input sockets (forceInput) axes, reference_description, system_prompt, user_prompt now render as INPUT SOCKETS (forceInput) so they can be wired from other nodes — e.g. describe's canonical output -> compare's reference_description, or a text node -> chat prompts. Small config (report_dir, run_tag, model_path, ...) stays as typeable fields. Unconnected sockets fall back to sensible defaults; the agent/bridge can still set them by value via the API. Dropped the now-socket fields from the example workflows; bumped their max_new_tokens to 2048. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 10:49:37 +02:00
Ethanfel	d389d6daff	Trim dead inputs: drop fp16 precision and prompt_used fp16 offers nothing over bf16 for these models (removed from the quant dropdown; loader still tolerant if passed). prompt_used was metadata-only — removed from the node inputs, report payload/markdown, the bridge, and the example workflows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 10:03:06 +02:00
Ethanfel	887dfc0bbb	Add analysis profiles with distance/proximity-aware axes A discrete verdict collapses magnitude and a generic axis can hide what you're calibrating (a blowjob where the head is 20cm away still reads sexual_act=oral -> MATCH). New 'profile' input selects an act-specialized axis set (general / oral / penetration / handjob / solo) whose act-critical axes capture distance explicitly (mouth_genital_distance: touching/<5cm/10-20cm/>20cm, oral_depth, insertion_depth, stroke_position, ...). axes now overrides the profile when set. agent_bridge gains --profile; workflows + docs updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 00:48:46 +02:00
Ethanfel	69c1d6deb4	describe emits one canonical reference; compare can anchor on it Describe mode now produces a single coherent, internally-consistent canonical scene description (paragraph + per-axis spec, written to canonical_reference in the report). Compare gains an optional reference_description input: when set, it anchors on that fixed text and shows only the generated image (no swap) — so the reference side never drifts or self-contradicts across iterations; only the generated image is re-described each turn. agent_bridge gains --ref-desc / --ref-desc-file (reads the describe report's canonical_reference). Docs + example workflow updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:22:57 +02:00
Ethanfel	53f1f9b9b4	Switch compare to discrete verdicts + granular pose axes + per-axis definitions The 4B's 0-1 scores were unreliable (identical ref/gen scored ~0.6), so the judge now returns verdict match/partial/mismatch per axis; overall_score and a new mismatch_count are computed from verdicts on our side (reliable, monotonic). Expanded the action/pose cluster into position_name, body_orientation, limb_arrangement, penetration, contact_points, genital_visibility (+ breast_size) so explicit poses carry detail. Each axis now ships a one-line definition in the prompt so gender_mix/subject_count stop absorbing positional text. 24 axes total. Example workflows use the node default (axes=''). Docs realigned; stop condition is now mismatch_count==0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 23:15:51 +02:00
Ethanfel	95198a15b5	Initial commit: VLM-as-judge prompt calibration loop Qwen3-VL image-similarity judge node, external-prompt receptor node, agent_bridge CLI, example SDXL workflow, and methodology/agent-loop/ calibration-policy docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 22:15:56 +02:00

7 Commits