20 Commits

Author SHA1 Message Date
Ethanfel 8b567cb531 chat mode: json_output toggle to return clean extracted JSON
For JSON-producing system prompts (e.g. LTX prompt-relay), json_output=true pulls
the JSON object out of the reply (strips reasoning/prose/code-fences via _parse_json,
which handles nested schemas and reasoning-then-JSON) and returns it re-serialized;
falls back to raw text if none parses. agent_bridge gains --json-output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 02:09:36 +02:00
Ethanfel f7ea559690 Speed: auto flash-attention/SDPA + document perf levers
transformers .generate() is the slow path; reasoning token volume and swap_eval
(2 passes) are the multipliers. Now requests attn_implementation flash_attention_2
-> sdpa -> default automatically (free speedup, flash-attn optional). README gains
a Performance section: swap_eval off (biggest free win), flash-attn, smaller model/
fewer axes, avoid nf4 for speed, and vLLM/SGLang as the real production-speed path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 11:18:11 +02:00
Ethanfel 22fd24b29e Re-enable reasoning for accurate verdicts (no-think rubber-stamped 'match')
Disabling thinking made reasoning models mark everything 'match' even when ref/gen
clearly differ. Added an enable_thinking toggle (default ON) threaded through the
generation path; the prompt now allows reasoning then asks for the result, and
verdict_rule explicitly warns against lazy 'match'. _parse_json now scans for the
JSON object AFTER the reasoning prose (last balanced object with 'axes'), and the
markdown fallback already reads reasoned per-axis output. Default max_new_tokens
2048->3072 so verdicts don't get cut off.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:56:47 +02:00
Ethanfel fee136e98c Expose long-text fields as input sockets (forceInput)
axes, reference_description, system_prompt, user_prompt now render as INPUT
SOCKETS (forceInput) so they can be wired from other nodes — e.g. describe's
canonical output -> compare's reference_description, or a text node -> chat
prompts. Small config (report_dir, run_tag, model_path, ...) stays as typeable
fields. Unconnected sockets fall back to sensible defaults; the agent/bridge can
still set them by value via the API. Dropped the now-socket fields from the
example workflows; bumped their max_new_tokens to 2048.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:49:37 +02:00
Ethanfel 0e9e99b8b2 Handle reasoning models (Qwen3.5/3.6): no-think + JSON-only + prose fallback
Qwen3.5/3.6 are reasoning models — they 'think out loud' in markdown and never
reach the JSON, then get cut off at the token limit -> '(no parseable judgement)'.
Fixes: apply_chat_template(enable_thinking=False) + strip <think> blocks; hardened
'output ONLY JSON, do not think out loud' instruction; default max_new_tokens
1024->2048 (max 8192); and a markdown fallback parser (_parse_markdown_verdicts /
_parse_axes) that extracts per-axis {verdict,ref,gen} from the prose the model
reliably emits. describe falls back to using the raw text as the caption.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:25:16 +02:00
Ethanfel f5be04a5cb Fix: render text inputs as editable fields, not input sockets
Widget-type inputs in the optional section render as connection sockets (not
editable boxes) in some ComfyUI frontends. Moved all widgets (report_dir, run_tag,
reference_description, system_prompt, user_prompt, keep_loaded, auto_download) into
required; only generated_image (a real node-to-node wire) stays optional. Same fix
for the receptor's source_file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:18:12 +02:00
Ethanfel d389d6daff Trim dead inputs: drop fp16 precision and prompt_used
fp16 offers nothing over bf16 for these models (removed from the quant dropdown;
loader still tolerant if passed). prompt_used was metadata-only — removed from the
node inputs, report payload/markdown, the bridge, and the example workflows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:03:06 +02:00
Ethanfel 271aa8ae42 Add chat mode: use the node as a general VLM, not just a judge
New mode='chat' with system_prompt + user_prompt inputs runs your own prompt over
the image(s) and returns raw text in 'analysis' — reusing the same model dropdown,
quant, auto-download and backend. Makes it a one-node abliterated VLM for captioning,
tagging, Q&A, prompt-from-image, etc. agent_bridge gains --mode chat /
--system-prompt / --user-prompt (no receptor needed). Writes a chat report
(latest.json) for the agent. Docs updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 09:55:36 +02:00
Ethanfel 5cff883914 Add Qwen3.5/3.6 abliterated (safetensors) + split model/quant selectors
No GGUF needed: huihui ships Qwen3.5-9B, Qwen3.6-27B, Qwen3.6-35B-A3B as multimodal
SAFETENSORS (abliterated), loadable via transformers AutoModelForMultimodalLM. Added
them to the model dropdown. _resolve_vl_classes now tries AutoModelForMultimodalLM
(3.5/3.6) and AutoModelForImageTextToText (Qwen3-VL) in name-based order, with
load fallback across candidates. model_select is now the model NAME only; precision
is the separate quant dropdown applied to it (repo_by_precision routes e.g. the local
fp8 dir). Aliases 3.5-9b/3.6-27b/3.6-35b. VRAM-by-quant table in README. Needs a
recent transformers for 3.5/3.6.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 09:50:13 +02:00
Ethanfel e29df0b319 Keep node transformers-only: drop GGUF presets from dropdown
Per decision, this node stays transformers/safetensors. Removed the HauhauCS
GGUF presets from model_select; the manual .gguf guard now points to a dedicated
GGUF node (1038lab/ComfyUI-QwenVL, KLL535 Simple-Qwen3-VL-gguf). Dropdown lists
the huihui 4B/8B/30B-A3B judges with VRAM hints. Docs note GGUF models run in a
separate node and feed their text into the loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 09:26:32 +02:00
Ethanfel 34adb946a4 Add model dropdown (presets w/ VRAM) + manual override; list HauhauCS GGUF models
New model_select dropdown with suggested VRAM in each label: huihui Qwen3-VL
4B(local)/8B/30B-A3B (transformers, auto-download) and HauhauCS Qwen3.5-9B /
Qwen3.6-35B-A3B Uncensored Aggressive (GGUF). model_path is now the manual
override (empty = use dropdown). agent_bridge gains --model-select/--model-path.
The HauhauCS models are GGUF-only (no safetensors) so the transformers backend
can't load them yet — selecting one returns a clear 'GGUF backend pending'
message until the backend is added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 09:24:28 +02:00
Ethanfel 887dfc0bbb Add analysis profiles with distance/proximity-aware axes
A discrete verdict collapses magnitude and a generic axis can hide what you're
calibrating (a blowjob where the head is 20cm away still reads sexual_act=oral ->
MATCH). New 'profile' input selects an act-specialized axis set (general / oral /
penetration / handjob / solo) whose act-critical axes capture distance explicitly
(mouth_genital_distance: touching/<5cm/10-20cm/>20cm, oral_depth, insertion_depth,
stroke_position, ...). axes now overrides the profile when set. agent_bridge gains
--profile; workflows + docs updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 00:48:46 +02:00
Ethanfel 06992506d7 Drop named-position axis for grounded geometry (30B still mis-names positions)
Even the 30B mis-identifies named sex positions (doggy/cowgirl) from images, so
position_name is removed. The pose cluster is now purely observable geometry:
body_orientation enriched with facing direction (who faces whom), plus
limb_arrangement / contact_points / pose. The agent composes any named label from
these reliable primitives. 23 default axes. Docs/examples updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:49:23 +02:00
Ethanfel e4dfaac63b Correct 4B 'partial' bias on identical values; harden verdict rule; note model-capability limits
The 4B over-uses 'partial' (mislabels identical ref/gen and clear opposites) and
also mis-identifies fine-grained content (e.g. names a position 'doggy'/'cowgirl'
when it is neither). Deterministic fix: force verdict=match when normalized
ref==gen. Prompt hardened to not default to 'partial' (opposites=mismatch). Docs:
the 4B is only reliable for coarse attributes — use the 30B for fine-grained
recognition; prefer grounded geometry axes over named-position labels.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:43:34 +02:00
Ethanfel 69c1d6deb4 describe emits one canonical reference; compare can anchor on it
Describe mode now produces a single coherent, internally-consistent canonical
scene description (paragraph + per-axis spec, written to canonical_reference in
the report). Compare gains an optional reference_description input: when set, it
anchors on that fixed text and shows only the generated image (no swap) — so the
reference side never drifts or self-contradicts across iterations; only the
generated image is re-described each turn. agent_bridge gains --ref-desc /
--ref-desc-file (reads the describe report's canonical_reference). Docs + example
workflow updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:22:57 +02:00
Ethanfel 53f1f9b9b4 Switch compare to discrete verdicts + granular pose axes + per-axis definitions
The 4B's 0-1 scores were unreliable (identical ref/gen scored ~0.6), so the
judge now returns verdict match/partial/mismatch per axis; overall_score and a
new mismatch_count are computed from verdicts on our side (reliable, monotonic).
Expanded the action/pose cluster into position_name, body_orientation,
limb_arrangement, penetration, contact_points, genital_visibility (+ breast_size)
so explicit poses carry detail. Each axis now ships a one-line definition in the
prompt so gender_mix/subject_count stop absorbing positional text. 24 axes total.
Example workflows use the node default (axes=''). Docs realigned; stop condition
is now mismatch_count==0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:15:51 +02:00
Ethanfel c7ef756a71 Add describe (first-pass) mode to the judge node
New mode on QwenVLImageJudge: 'describe' looks at the reference alone and returns
a prompt-ready caption + per-axis target spec to seed the very first prompt (the
generator has nothing to reproduce yet). 'compare' is the existing ref-vs-gen
scoring. generated_image is now optional (required only for compare); shared
generation refactored into _generate_from_messages; third output renamed
diff_analysis -> analysis (mode-agnostic). agent_bridge gains --mode (describe
needs no receptor/prompt); added workflow_describe_api.json. Docs updated with the
first-pass bootstrap step. Fixed error-return arity to 5-tuple.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:04:09 +02:00
Ethanfel 959ec70065 Redesign judge output for calibration: per-axis {score, ref, gen}, drop local fix suggestions
The local VLM now only observes and scores; correction is left to the stronger
external agent. Each axis reports the target value (ref), the current value (gen)
and the closeness (score) — the target/current/distance an agent needs to
calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/
camera/render) so the action cluster stays discriminative for explicit content.
swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first;
default max_new_tokens 1024. Docs aligned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:52:40 +02:00
Ethanfel aa3983d94a Fix: handle missing processor chat template
ComfyUI-converted checkpoints ship the template as chat_template.jinja
(not on the processor), so apply_chat_template raised 'this processor does
not have a chat template'. Backfill processor.chat_template from
chat_template.jinja/.json or the tokenizer at load time, and fall back to a
hand-built Qwen-VL ChatML prompt if none exists. Also keep *.jinja in the
auto-download patterns.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:36:39 +02:00
Ethanfel 95198a15b5 Initial commit: VLM-as-judge prompt calibration loop
Qwen3-VL image-similarity judge node, external-prompt receptor node,
agent_bridge CLI, example SDXL workflow, and methodology/agent-loop/
calibration-policy docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:15:56 +02:00