fp16 offers nothing over bf16 for these models (removed from the quant dropdown;
loader still tolerant if passed). prompt_used was metadata-only — removed from the
node inputs, report payload/markdown, the bridge, and the example workflows.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New mode='chat' with system_prompt + user_prompt inputs runs your own prompt over
the image(s) and returns raw text in 'analysis' — reusing the same model dropdown,
quant, auto-download and backend. Makes it a one-node abliterated VLM for captioning,
tagging, Q&A, prompt-from-image, etc. agent_bridge gains --mode chat /
--system-prompt / --user-prompt (no receptor needed). Writes a chat report
(latest.json) for the agent. Docs updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
No GGUF needed: huihui ships Qwen3.5-9B, Qwen3.6-27B, Qwen3.6-35B-A3B as multimodal
SAFETENSORS (abliterated), loadable via transformers AutoModelForMultimodalLM. Added
them to the model dropdown. _resolve_vl_classes now tries AutoModelForMultimodalLM
(3.5/3.6) and AutoModelForImageTextToText (Qwen3-VL) in name-based order, with
load fallback across candidates. model_select is now the model NAME only; precision
is the separate quant dropdown applied to it (repo_by_precision routes e.g. the local
fp8 dir). Aliases 3.5-9b/3.6-27b/3.6-35b. VRAM-by-quant table in README. Needs a
recent transformers for 3.5/3.6.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per decision, this node stays transformers/safetensors. Removed the HauhauCS
GGUF presets from model_select; the manual .gguf guard now points to a dedicated
GGUF node (1038lab/ComfyUI-QwenVL, KLL535 Simple-Qwen3-VL-gguf). Dropdown lists
the huihui 4B/8B/30B-A3B judges with VRAM hints. Docs note GGUF models run in a
separate node and feed their text into the loop.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A discrete verdict collapses magnitude and a generic axis can hide what you're
calibrating (a blowjob where the head is 20cm away still reads sexual_act=oral ->
MATCH). New 'profile' input selects an act-specialized axis set (general / oral /
penetration / handjob / solo) whose act-critical axes capture distance explicitly
(mouth_genital_distance: touching/<5cm/10-20cm/>20cm, oral_depth, insertion_depth,
stroke_position, ...). axes now overrides the profile when set. agent_bridge gains
--profile; workflows + docs updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Describe mode now produces a single coherent, internally-consistent canonical
scene description (paragraph + per-axis spec, written to canonical_reference in
the report). Compare gains an optional reference_description input: when set, it
anchors on that fixed text and shows only the generated image (no swap) — so the
reference side never drifts or self-contradicts across iterations; only the
generated image is re-described each turn. agent_bridge gains --ref-desc /
--ref-desc-file (reads the describe report's canonical_reference). Docs + example
workflow updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 4B's 0-1 scores were unreliable (identical ref/gen scored ~0.6), so the
judge now returns verdict match/partial/mismatch per axis; overall_score and a
new mismatch_count are computed from verdicts on our side (reliable, monotonic).
Expanded the action/pose cluster into position_name, body_orientation,
limb_arrangement, penetration, contact_points, genital_visibility (+ breast_size)
so explicit poses carry detail. Each axis now ships a one-line definition in the
prompt so gender_mix/subject_count stop absorbing positional text. 24 axes total.
Example workflows use the node default (axes=''). Docs realigned; stop condition
is now mismatch_count==0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New mode on QwenVLImageJudge: 'describe' looks at the reference alone and returns
a prompt-ready caption + per-axis target spec to seed the very first prompt (the
generator has nothing to reproduce yet). 'compare' is the existing ref-vs-gen
scoring. generated_image is now optional (required only for compare); shared
generation refactored into _generate_from_messages; third output renamed
diff_analysis -> analysis (mode-agnostic). agent_bridge gains --mode (describe
needs no receptor/prompt); added workflow_describe_api.json. Docs updated with the
first-pass bootstrap step. Fixed error-return arity to 5-tuple.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The local VLM now only observes and scores; correction is left to the stronger
external agent. Each axis reports the target value (ref), the current value (gen)
and the closeness (score) — the target/current/distance an agent needs to
calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/
camera/render) so the action cluster stays discriminative for explicit content.
swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first;
default max_new_tokens 1024. Docs aligned.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>