Ethanfel c7ef756a71 Add describe (first-pass) mode to the judge node
New mode on QwenVLImageJudge: 'describe' looks at the reference alone and returns
a prompt-ready caption + per-axis target spec to seed the very first prompt (the
generator has nothing to reproduce yet). 'compare' is the existing ref-vs-gen
scoring. generated_image is now optional (required only for compare); shared
generation refactored into _generate_from_messages; third output renamed
diff_analysis -> analysis (mode-agnostic). agent_bridge gains --mode (describe
needs no receptor/prompt); added workflow_describe_api.json. Docs updated with the
first-pass bootstrap step. Fixed error-return arity to 5-tuple.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 23:04:09 +02:00

ComfyUI-Prompt-Calibratror

A fully local prompt calibration loop for ComfyUI. A vision-language model (Qwen3-VL) judges how close a generated image is to a reference image and returns a structured score + per-axis difference analysis, which is used to calibrate the prompt-generation method (ComfyUI-Prompt-Builder) until the generated image matches the reference.

Full design rationale, controller options, and VLM-as-judge variance mitigations are in docs/METHODOLOGY.md. The controller is an external CLI agent that drives ComfyUI via its HTTP API — see docs/AGENT_LOOP.md.

Nodes & tools

Component What it is
Qwen3-VL Image Judge (Calibrator) scores generated vs reference, writes analysis to disk for the agent
SxCP External Prompt (Receptor) stable injection point; the agent sets prompt/negative/seed here per queue
agent_bridge.py one CLI call = one iteration (inject → POST /prompt → wait → print analysis JSON)

The "vllm node": Qwen3-VL Image Judge (Calibrator)

The core node (nodes/qwen_judge.py). It reuses the standard transformers Qwen3-VL inference plumbing (same approach as ComfyUI-QwenVL-MultiImage — the recommended reuse base) but forces strict JSON output so an automated loop can act on it.

Inputs

name type default notes
reference_image IMAGE the target
mode compare / describe compare describe = first pass over the reference only → caption + target spec (seeds the prompt). compare = score ref vs generated
generated_image IMAGE (optional) the candidate to score (required for compare, ignored for describe)
model_path STRING /media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16 local dir, HF repo id (org/name), or alias (30b-a3b / 8b / 4b)
precision bf16 / fp16 / fp8 / nf4 bf16 nf4 = 4-bit (run the 30B judge on 32 GB); fp8 with the hf_fp8 copy
axes STRING ~20 axes (identity, body, wardrobe, action, affect, camera, render) scored/described axes; granular for explicit content. Edit to taste
max_new_tokens INT 1024
temperature FLOAT 0.0 0 = greedy/repeatable
swap_eval BOOL true run twice with images swapped, average → cuts position bias
keep_loaded BOOL true cache weights across loop iterations
auto_download BOOL true if model_path is a repo id/alias and not local, fetch it from HF into models/prompt_generator/

Auto-download: set model_path to 30b-a3b (alias) or any org/name repo id and leave auto_download on — the node snapshot-downloads it on first run (into ComfyUI's models/prompt_generator/<name>) and reuses the local copy afterward. Local paths and the default skip download entirely.

Outputs

name type use
overall_score FLOAT 0..1 compare: loop stop-condition / objective. describe: 1.0 placeholder
axis_scores_json STRING (JSON) compare: per-axis {score, ref, gen}. describe: per-axis target values {axis: value}
analysis STRING compare: summary, worst axes first (score ref:[…] gen:[…]). describe: the prompt-ready caption
raw STRING raw model output (both passes if swap_eval)
report_path STRING path to the written calib_<tag>.json

Install

cd /media/p5/Comfyui/custom_nodes
ln -s /media/p5/ComfyUI-Prompt-Calibratror .     # or git clone
/media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt

The node defaults to the huihui-ai Qwen3-VL-4B-Instruct abliterated weights already converted at /media/p5/qwen3vl_4b_abliterated_comfy_convert/ so it runs out of the box (the abliterated/uncensored variant won't refuse to analyze adult imagery, which would otherwise break the loop).

Recommended upgrade (latest Qwen VL + uncensored, fits 32 GB): huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated — MoE (3B active, fast), run at precision=nf4 (~18 GB). The node auto-detects the MoE class. An easier middle ground is the 8B abliterated at bf16 (~17 GB, no quantization). Qwen3.5-VL abliterated isn't out yet (Qwen3.5 abliterated builds are text-only so far); Gemma-3-27B-it abliterated (4-bit) is a viable non-Qwen alternative. See docs/METHODOLOGY.md.

Loop sketch

Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge
        ▲                                                                │
        └──────── knob overrides ◀── Controller ◀── overall_score + diff ┘

Use the Prompt-Builder For-Loop Start/End + Accumulator nodes to drive iterations and route overall_score into the stop condition. Controller options (greedy hill-climb → black-box optimizer → LLM-in-the-loop) are in the methodology doc.

End-to-end loop

  1. Run ComfyUI with --listen, install this node pack, put your reference at ComfyUI/input/reference.png.
  2. First pass (describe): the judge looks at the reference alone and returns a prompt-ready caption + per-axis target spec to seed the initial prompt:
    python agent_bridge.py --mode describe --workflow workflow/workflow_describe_api.json \
      --run-tag seed --analysis-dir /media/p5/Comfyui/output/calibrator
    
  3. Compare loop: load workflow/workflow_api.json (SDXL waiIllustriousSDXL_v160 example — swap the checkpoint for Flux/Krea as needed) and iterate, following docs/CALIBRATION_POLICY.md:
    python agent_bridge.py --workflow workflow/workflow_api.json \
      --prompt "<caption from step 2, then calibrated>" \
      --run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator
    
    stdout = the analysis JSON ({score, ref, gen} per axis) → agent steers toward ref → next iteration.

Status

  • Methodology + node selection (docs/METHODOLOGY.md)
  • Qwen3-VL Image Judge node — describe (first pass) + compare (scoring), swap-eval, file report
  • Agent-driven architecture (docs/AGENT_LOOP.md) — Receptor node + agent_bridge.py (--mode)
  • Example workflows: workflow_describe_api.json (first pass) + workflow_api.json (compare loop)
  • Agent calibration policy (docs/CALIBRATION_POLICY.md)
  • Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)
S
Description
No description provided
Readme 484 KiB
Languages
Python 100%