No GGUF needed: huihui ships Qwen3.5-9B, Qwen3.6-27B, Qwen3.6-35B-A3B as multimodal SAFETENSORS (abliterated), loadable via transformers AutoModelForMultimodalLM. Added them to the model dropdown. _resolve_vl_classes now tries AutoModelForMultimodalLM (3.5/3.6) and AutoModelForImageTextToText (Qwen3-VL) in name-based order, with load fallback across candidates. model_select is now the model NAME only; precision is the separate quant dropdown applied to it (repo_by_precision routes e.g. the local fp8 dir). Aliases 3.5-9b/3.6-27b/3.6-35b. VRAM-by-quant table in README. Needs a recent transformers for 3.5/3.6. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ComfyUI-Prompt-Calibratror
A fully local prompt calibration loop for ComfyUI. A vision-language model (Qwen3-VL) judges how close a generated image is to a reference image and returns a structured score + per-axis difference analysis, which is used to calibrate the prompt-generation method (ComfyUI-Prompt-Builder) until the generated image matches the reference.
Full design rationale, controller options, and VLM-as-judge variance mitigations are in docs/METHODOLOGY.md. The controller is an external CLI agent that drives ComfyUI via its HTTP API — see docs/AGENT_LOOP.md.
Nodes & tools
| Component | What it is |
|---|---|
Qwen3-VL Image Judge (Calibrator) |
scores generated vs reference, writes analysis to disk for the agent |
SxCP External Prompt (Receptor) |
stable injection point; the agent sets prompt/negative/seed here per queue |
agent_bridge.py |
one CLI call = one iteration (inject → POST /prompt → wait → print analysis JSON) |
The "vllm node": Qwen3-VL Image Judge (Calibrator)
The core node (nodes/qwen_judge.py). It reuses the standard transformers Qwen3-VL
inference plumbing (same approach as
ComfyUI-QwenVL-MultiImage
— the recommended reuse base) but forces strict JSON output so an automated loop
can act on it.
Inputs
| name | type | default | notes |
|---|---|---|---|
reference_image |
IMAGE | — | the target |
mode |
compare / describe | compare | describe = first pass over the reference only → caption + target spec (seeds the prompt). compare = score ref vs generated |
profile |
general / oral / penetration / handjob / solo | general | analysis profile — act-specialized axis set; the act-critical axes are distance/proximity-aware (e.g. mouth_genital_distance) so magnitude isn't hidden behind a coarse label |
generated_image |
IMAGE (optional) | — | the candidate to score (required for compare, ignored for describe) |
model_select |
dropdown (model name) | 4B local | which judge (transformers/safetensors, auto-downloaded): Qwen3-VL 4B/8B/30B-A3B, Qwen3.5-9B, Qwen3.6-27B/35B-A3B (newer, natively multimodal). Param size shown in the label |
precision |
bf16 / fp16 / fp8 / nf4 | bf16 | the quant — applies to the selected model (VRAM table below) |
model_path |
STRING | "" (empty) | manual override of the dropdown — local dir, HF repo id, or alias (8b/30b-a3b/3.5-9b/3.6-27b/3.6-35b). Empty = use model_select |
axes |
STRING | "" (empty) | override the profile's axis set with a custom comma/newline list; empty = use profile |
max_new_tokens |
INT | 1024 | |
temperature |
FLOAT | 0.0 | 0 = greedy/repeatable |
swap_eval |
BOOL | true | run twice with images swapped, average → cuts position bias |
keep_loaded |
BOOL | true | cache weights across loop iterations |
auto_download |
BOOL | true | if model_path is a repo id/alias and not local, fetch it from HF into models/prompt_generator/ |
Auto-download: set model_path to 30b-a3b (alias) or any org/name repo id and leave
auto_download on — the node snapshot-downloads it on first run (into ComfyUI's
models/prompt_generator/<name>) and reuses the local copy afterward. Local paths and the
default skip download entirely.
Outputs
| name | type | use |
|---|---|---|
overall_score |
FLOAT 0..1 | compare: mean verdict (computed here, not by the model). describe: 1.0 placeholder |
axis_scores_json |
STRING (JSON) | compare: per-axis {verdict, ref, gen} (verdict = match/partial/mismatch). describe: {axis: value} |
analysis |
STRING | compare: header (overall, N mismatches) + axes worst-first (VERDICT ref:[…] gen:[…]). describe: the caption |
raw |
STRING | raw model output (both passes if swap_eval) |
report_path |
STRING | path to the written calib_<tag>.json (carries mismatch_count) |
Install
cd /media/p5/Comfyui/custom_nodes
ln -s /media/p5/ComfyUI-Prompt-Calibratror . # or git clone
/media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt
The node defaults to the huihui-ai Qwen3-VL-4B-Instruct abliterated weights already
converted at /media/p5/qwen3vl_4b_abliterated_comfy_convert/ so it runs out of the box
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
otherwise break the loop).
Pick a model in model_select and a quant in precision. All are abliterated,
multimodal safetensors (transformers), auto-downloaded. The newer Qwen3.5/3.6 are
natively multimodal (need a recent transformers — they load via AutoModelForMultimodalLM).
VRAM by quant on the RTX 5090 32 GB (✅ fits / ⚠ tight / ❌):
| model | bf16 | fp8 | nf4 | note |
|---|---|---|---|---|
| Qwen3-VL-4B (local) | ✅ ~9 | ✅ ~5 | ✅ ~3 | fast, weak |
| Qwen3-VL-8B | ✅ ~17 | ✅ ~9 | ✅ ~6 | solid, fast |
| Qwen3.5-9B | ✅ ~20 | ✅ ~10 | ✅ ~7 | newer, fast — recommended |
| Qwen3-VL-30B-A3B (MoE) | ❌ ~62 | ⚠ ~31 | ✅ ~18 | nf4 slow |
| Qwen3.6-27B (dense) | ❌ ~56 | ⚠ ~28 | ✅ ~16 | nf4 slow, strong |
| Qwen3.6-35B-A3B (MoE) | ❌ ~70 | ❌ | ✅ ~20 | nf4 slow, top quality |
nf4 (bitsandbytes) fits the big ones but is slow (dequant overhead) — that's the
bottleneck, not the model. fp8 is fast but only when a real fp8 checkpoint exists (the
local 4B has one; precision=fp8 on a bf16-only repo won't quantize). For speed + recency,
Qwen3.5-9B at bf16 is the sweet spot. See
docs/METHODOLOGY.md.
Loop sketch
Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge
▲ │
└──────── knob overrides ◀── Controller ◀── overall_score + diff ┘
Use the Prompt-Builder For-Loop Start/End + Accumulator nodes to drive iterations and
route overall_score into the stop condition. Controller options (greedy hill-climb →
black-box optimizer → LLM-in-the-loop) are in the methodology doc.
End-to-end loop
- Run ComfyUI with
--listen, install this node pack, put your reference atComfyUI/input/reference.png. - First pass (describe): the judge looks at the reference alone and emits one canonical
scene description (coherent paragraph + per-axis target spec) to seed the prompt and
anchor the loop:
python agent_bridge.py --mode describe --workflow workflow/workflow_describe_api.json \ --run-tag seed --analysis-dir /media/p5/Comfyui/output/calibrator - Compare loop: load
workflow/workflow_api.json(SDXLwaiIllustriousSDXL_v160example — swap the checkpoint for Flux/Krea as needed) and iterate, followingdocs/CALIBRATION_POLICY.md. Pass--ref-desc-fileso compare anchors on the canonical reference (therefside stays fixed; only the generated image is re-read each turn):stdout = the analysis JSON (python agent_bridge.py --workflow workflow/workflow_api.json \ --prompt "<description from step 2, then calibrated>" \ --ref-desc-file /media/p5/Comfyui/output/calibrator/calib_seed.json \ --run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator{verdict, ref, gen}per axis) → agent steers towardref→ next iteration.
Status
- Methodology + node selection (
docs/METHODOLOGY.md) - Qwen3-VL Image Judge node —
describe(first pass) +compare(scoring), swap-eval, file report - Agent-driven architecture (
docs/AGENT_LOOP.md) — Receptor node +agent_bridge.py(--mode) - Example workflows:
workflow_describe_api.json(first pass) +workflow_api.json(compare loop) - Agent calibration policy (
docs/CALIBRATION_POLICY.md) - Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)