New mode on QwenVLImageJudge: 'describe' looks at the reference alone and returns
a prompt-ready caption + per-axis target spec to seed the very first prompt (the
generator has nothing to reproduce yet). 'compare' is the existing ref-vs-gen
scoring. generated_image is now optional (required only for compare); shared
generation refactored into _generate_from_messages; third output renamed
diff_analysis -> analysis (mode-agnostic). agent_bridge gains --mode (describe
needs no receptor/prompt); added workflow_describe_api.json. Docs updated with the
first-pass bootstrap step. Fixed error-return arity to 5-tuple.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The local VLM now only observes and scores; correction is left to the stronger
external agent. Each axis reports the target value (ref), the current value (gen)
and the closeness (score) — the target/current/distance an agent needs to
calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/
camera/render) so the action cluster stays discriminative for explicit content.
swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first;
default max_new_tokens 1024. Docs aligned.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ComfyUI-converted checkpoints ship the template as chat_template.jinja
(not on the processor), so apply_chat_template raised 'this processor does
not have a chat template'. Backfill processor.chat_template from
chat_template.jinja/.json or the tokenizer at load time, and fall back to a
hand-built Qwen-VL ChatML prompt if none exists. Also keep *.jinja in the
auto-download patterns.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>