The local VLM now only observes and scores; correction is left to the stronger external agent. Each axis reports the target value (ref), the current value (gen) and the closeness (score) — the target/current/distance an agent needs to calibrate. Expanded to ~20 granular axes (identity/body/wardrobe/action/affect/ camera/render) so the action cluster stays discriminative for explicit content. swap_eval now inverts ref/gen of the swapped pass; diff summary sorts worst-first; default max_new_tokens 1024. Docs aligned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ComfyUI-Prompt-Calibratror
A fully local prompt calibration loop for ComfyUI. A vision-language model (Qwen3-VL) judges how close a generated image is to a reference image and returns a structured score + per-axis difference analysis, which is used to calibrate the prompt-generation method (ComfyUI-Prompt-Builder) until the generated image matches the reference.
Full design rationale, controller options, and VLM-as-judge variance mitigations are in docs/METHODOLOGY.md. The controller is an external CLI agent that drives ComfyUI via its HTTP API — see docs/AGENT_LOOP.md.
Nodes & tools
| Component | What it is |
|---|---|
Qwen3-VL Image Judge (Calibrator) |
scores generated vs reference, writes analysis to disk for the agent |
SxCP External Prompt (Receptor) |
stable injection point; the agent sets prompt/negative/seed here per queue |
agent_bridge.py |
one CLI call = one iteration (inject → POST /prompt → wait → print analysis JSON) |
The "vllm node": Qwen3-VL Image Judge (Calibrator)
The core node (nodes/qwen_judge.py). It reuses the standard transformers Qwen3-VL
inference plumbing (same approach as
ComfyUI-QwenVL-MultiImage
— the recommended reuse base) but forces strict JSON output so an automated loop
can act on it.
Inputs
| name | type | default | notes |
|---|---|---|---|
reference_image |
IMAGE | — | the target |
generated_image |
IMAGE | — | the candidate to score |
model_path |
STRING | /media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16 |
local dir, HF repo id (org/name), or alias (30b-a3b / 8b / 4b) |
precision |
bf16 / fp16 / fp8 / nf4 | bf16 | nf4 = 4-bit (run the 30B judge on 32 GB); fp8 with the hf_fp8 copy |
axes |
STRING | ~20 axes (identity, body, wardrobe, action, affect, camera, render) | scored axes; granular for explicit content. Edit to taste |
max_new_tokens |
INT | 512 | |
temperature |
FLOAT | 0.0 | 0 = greedy/repeatable |
swap_eval |
BOOL | true | run twice with images swapped, average → cuts position bias |
keep_loaded |
BOOL | true | cache weights across loop iterations |
auto_download |
BOOL | true | if model_path is a repo id/alias and not local, fetch it from HF into models/prompt_generator/ |
Auto-download: set model_path to 30b-a3b (alias) or any org/name repo id and leave
auto_download on — the node snapshot-downloads it on first run (into ComfyUI's
models/prompt_generator/<name>) and reuses the local copy afterward. Local paths and the
default skip download entirely.
Outputs
| name | type | use |
|---|---|---|
overall_score |
FLOAT 0..1 | loop stop-condition / objective |
axis_scores_json |
STRING (JSON) | per-axis {score, ref, gen} — target vs current, for the agent |
diff_analysis |
STRING | readable summary, worst axes first (score ref:[…] gen:[…]) |
raw |
STRING | raw model output (both passes if swap_eval) |
Install
cd /media/p5/Comfyui/custom_nodes
ln -s /media/p5/ComfyUI-Prompt-Calibratror . # or git clone
/media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt
The node defaults to the huihui-ai Qwen3-VL-4B-Instruct abliterated weights already
converted at /media/p5/qwen3vl_4b_abliterated_comfy_convert/ so it runs out of the box
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
otherwise break the loop).
Recommended upgrade (latest Qwen VL + uncensored, fits 32 GB):
huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated
— MoE (3B active, fast), run at precision=nf4 (~18 GB). The node auto-detects the MoE
class. An easier middle ground is the 8B abliterated at bf16 (~17 GB, no quantization).
Qwen3.5-VL abliterated isn't out yet (Qwen3.5 abliterated builds are text-only so far);
Gemma-3-27B-it abliterated (4-bit) is a viable non-Qwen alternative. See
docs/METHODOLOGY.md.
Loop sketch
Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge
▲ │
└──────── knob overrides ◀── Controller ◀── overall_score + diff ┘
Use the Prompt-Builder For-Loop Start/End + Accumulator nodes to drive iterations and
route overall_score into the stop condition. Controller options (greedy hill-climb →
black-box optimizer → LLM-in-the-loop) are in the methodology doc.
End-to-end loop
- Run ComfyUI with
--listen, install this node pack, put your reference atComfyUI/input/reference.png. - Load
workflow/workflow_api.json(SDXLwaiIllustriousSDXL_v160example — swap the checkpoint for Flux/Krea as needed). - Drive it from your agent following
docs/CALIBRATION_POLICY.md:stdout = the analysis JSON → agent calibrates → next iteration.python agent_bridge.py --workflow workflow/workflow_api.json \ --prompt "1 woman, red lingerie, bedroom, full body, warm light" \ --run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator
Status
- Methodology + node selection (
docs/METHODOLOGY.md) - Qwen3-VL Image Judge node (structured JSON scoring, swap-eval, model caching, file report)
- Agent-driven architecture (
docs/AGENT_LOOP.md) — Receptor node +agent_bridge.py - Example end-to-end workflow (
workflow/workflow_api.json) - Agent calibration policy (
docs/CALIBRATION_POLICY.md) - Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)