Initial commit: VLM-as-judge prompt calibration loop
Qwen3-VL image-similarity judge node, external-prompt receptor node, agent_bridge CLI, example SDXL workflow, and methodology/agent-loop/ calibration-policy docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,8 @@
|
|||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
output/
|
||||||
|
models/
|
||||||
|
*.safetensors
|
||||||
|
*.gguf
|
||||||
|
.DS_Store
|
||||||
|
.venv/
|
||||||
@@ -0,0 +1,110 @@
|
|||||||
|
# ComfyUI-Prompt-Calibratror
|
||||||
|
|
||||||
|
A **fully local** prompt calibration loop for ComfyUI. A vision-language model
|
||||||
|
(Qwen3-VL) judges how close a *generated* image is to a *reference* image and
|
||||||
|
returns a structured score + per-axis difference analysis, which is used to
|
||||||
|
**calibrate the prompt-generation method** ([ComfyUI-Prompt-Builder](../ComfyUI-Prompt-Builder))
|
||||||
|
until the generated image matches the reference.
|
||||||
|
|
||||||
|
> Full design rationale, controller options, and VLM-as-judge variance mitigations
|
||||||
|
> are in **[docs/METHODOLOGY.md](docs/METHODOLOGY.md)**. The controller is an **external
|
||||||
|
> CLI agent** that drives ComfyUI via its HTTP API — see **[docs/AGENT_LOOP.md](docs/AGENT_LOOP.md)**.
|
||||||
|
|
||||||
|
## Nodes & tools
|
||||||
|
|
||||||
|
| Component | What it is |
|
||||||
|
|---|---|
|
||||||
|
| `Qwen3-VL Image Judge (Calibrator)` | scores generated vs reference, writes analysis to disk for the agent |
|
||||||
|
| `SxCP External Prompt (Receptor)` | stable injection point; the agent sets `prompt/negative/seed` here per queue |
|
||||||
|
| `agent_bridge.py` | one CLI call = one iteration (inject → `POST /prompt` → wait → print analysis JSON) |
|
||||||
|
|
||||||
|
## The "vllm node": `Qwen3-VL Image Judge (Calibrator)`
|
||||||
|
|
||||||
|
The core node (`nodes/qwen_judge.py`). It reuses the standard transformers Qwen3-VL
|
||||||
|
inference plumbing (same approach as
|
||||||
|
[ComfyUI-QwenVL-MultiImage](https://github.com/hardik-uppal/ComfyUI-QwenVL-MultiImage)
|
||||||
|
— the recommended reuse base) but **forces strict JSON output** so an automated loop
|
||||||
|
can act on it.
|
||||||
|
|
||||||
|
**Inputs**
|
||||||
|
|
||||||
|
| name | type | default | notes |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `reference_image` | IMAGE | — | the target |
|
||||||
|
| `generated_image` | IMAGE | — | the candidate to score |
|
||||||
|
| `model_path` | STRING | `/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16` | local dir, **HF repo id** (`org/name`), or alias (`30b-a3b` / `8b` / `4b`) |
|
||||||
|
| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | `nf4` = 4-bit (run the 30B judge on 32 GB); `fp8` with the `hf_fp8` copy |
|
||||||
|
| `axes` | STRING | cast, clothing, pose, scene, composition, expression, color_light | scored axes (match your Prompt-Builder knobs) |
|
||||||
|
| `max_new_tokens` | INT | 512 | |
|
||||||
|
| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
|
||||||
|
| `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias |
|
||||||
|
| `keep_loaded` | BOOL | true | cache weights across loop iterations |
|
||||||
|
| `auto_download` | BOOL | true | if `model_path` is a repo id/alias and not local, fetch it from HF into `models/prompt_generator/` |
|
||||||
|
|
||||||
|
**Auto-download:** set `model_path` to `30b-a3b` (alias) or any `org/name` repo id and leave
|
||||||
|
`auto_download` on — the node snapshot-downloads it on first run (into ComfyUI's
|
||||||
|
`models/prompt_generator/<name>`) and reuses the local copy afterward. Local paths and the
|
||||||
|
default skip download entirely.
|
||||||
|
|
||||||
|
**Outputs**
|
||||||
|
|
||||||
|
| name | type | use |
|
||||||
|
|---|---|---|
|
||||||
|
| `overall_score` | FLOAT 0..1 | loop stop-condition / objective |
|
||||||
|
| `axis_scores_json` | STRING (JSON) | per-axis `{score, diff}` for the controller |
|
||||||
|
| `diff_analysis` | STRING | human/controller-readable summary + fix suggestions |
|
||||||
|
| `raw` | STRING | raw model output (both passes if `swap_eval`) |
|
||||||
|
|
||||||
|
## Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /media/p5/Comfyui/custom_nodes
|
||||||
|
ln -s /media/p5/ComfyUI-Prompt-Calibratror . # or git clone
|
||||||
|
/media/p5/Comfyui/venv/bin/pip install -r /media/p5/ComfyUI-Prompt-Calibratror/requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
The node defaults to the **huihui-ai Qwen3-VL-4B-Instruct abliterated** weights already
|
||||||
|
converted at `/media/p5/qwen3vl_4b_abliterated_comfy_convert/` so it runs out of the box
|
||||||
|
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
|
||||||
|
otherwise break the loop).
|
||||||
|
|
||||||
|
**Recommended upgrade (latest Qwen VL + uncensored, fits 32 GB):**
|
||||||
|
[`huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated`](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated)
|
||||||
|
— MoE (3B active, fast), run at `precision=nf4` (~18 GB). The node auto-detects the MoE
|
||||||
|
class. An easier middle ground is the **8B** abliterated at `bf16` (~17 GB, no quantization).
|
||||||
|
Qwen3.5-VL abliterated isn't out yet (Qwen3.5 abliterated builds are text-only so far);
|
||||||
|
Gemma-3-27B-it abliterated (4-bit) is a viable non-Qwen alternative. See
|
||||||
|
[docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl).
|
||||||
|
|
||||||
|
## Loop sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
Prompt-Builder (SxCP) ──prompt──▶ T2I (SDXL/Flux/Krea2) ──image──▶ Qwen3-VL Image Judge
|
||||||
|
▲ │
|
||||||
|
└──────── knob overrides ◀── Controller ◀── overall_score + diff ┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the Prompt-Builder **For-Loop Start/End + Accumulator** nodes to drive iterations and
|
||||||
|
route `overall_score` into the stop condition. Controller options (greedy hill-climb →
|
||||||
|
black-box optimizer → LLM-in-the-loop) are in the methodology doc.
|
||||||
|
|
||||||
|
## End-to-end loop
|
||||||
|
|
||||||
|
1. Run ComfyUI with `--listen`, install this node pack, put your reference at `ComfyUI/input/reference.png`.
|
||||||
|
2. Load `workflow/workflow_api.json` (SDXL `waiIllustriousSDXL_v160` example — swap the checkpoint for Flux/Krea as needed).
|
||||||
|
3. Drive it from your agent following `docs/CALIBRATION_POLICY.md`:
|
||||||
|
```bash
|
||||||
|
python agent_bridge.py --workflow workflow/workflow_api.json \
|
||||||
|
--prompt "1 woman, red lingerie, bedroom, full body, warm light" \
|
||||||
|
--run-tag iter001 --analysis-dir /media/p5/Comfyui/output/calibrator
|
||||||
|
```
|
||||||
|
stdout = the analysis JSON → agent calibrates → next iteration.
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
- [x] Methodology + node selection (`docs/METHODOLOGY.md`)
|
||||||
|
- [x] Qwen3-VL Image Judge node (structured JSON scoring, swap-eval, model caching, file report)
|
||||||
|
- [x] Agent-driven architecture (`docs/AGENT_LOOP.md`) — Receptor node + `agent_bridge.py`
|
||||||
|
- [x] Example end-to-end workflow (`workflow/workflow_api.json`)
|
||||||
|
- [x] Agent calibration policy (`docs/CALIBRATION_POLICY.md`)
|
||||||
|
- [ ] Optional: structured-config receptor (carry Prompt-Builder knobs instead of a flat string)
|
||||||
+15
@@ -0,0 +1,15 @@
|
|||||||
|
"""ComfyUI-Prompt-Calibratror — VLM-as-judge prompt calibration loop."""
|
||||||
|
|
||||||
|
from .nodes.qwen_judge import (
|
||||||
|
NODE_CLASS_MAPPINGS as _JUDGE_CLASSES,
|
||||||
|
NODE_DISPLAY_NAME_MAPPINGS as _JUDGE_NAMES,
|
||||||
|
)
|
||||||
|
from .nodes.receptor import (
|
||||||
|
NODE_CLASS_MAPPINGS as _RECEPTOR_CLASSES,
|
||||||
|
NODE_DISPLAY_NAME_MAPPINGS as _RECEPTOR_NAMES,
|
||||||
|
)
|
||||||
|
|
||||||
|
NODE_CLASS_MAPPINGS = {**_JUDGE_CLASSES, **_RECEPTOR_CLASSES}
|
||||||
|
NODE_DISPLAY_NAME_MAPPINGS = {**_JUDGE_NAMES, **_RECEPTOR_NAMES}
|
||||||
|
|
||||||
|
__all__ = ["NODE_CLASS_MAPPINGS", "NODE_DISPLAY_NAME_MAPPINGS"]
|
||||||
+146
@@ -0,0 +1,146 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
agent_bridge.py — drive one calibration iteration from a CLI agent.
|
||||||
|
|
||||||
|
The external agent (controller/brain) calls this once per loop step:
|
||||||
|
|
||||||
|
python agent_bridge.py \
|
||||||
|
--workflow workflow_api.json \
|
||||||
|
--prompt "1 woman, red lingerie, bedroom, full body, warm light" \
|
||||||
|
--run-tag iter003 \
|
||||||
|
--analysis-dir /path/to/ComfyUI/output/calibrator
|
||||||
|
|
||||||
|
It injects the prompt into the `CalibratorPromptReceptor` node, queues the graph
|
||||||
|
on a running ComfyUI (`POST /prompt`), waits for completion (`GET /history/{id}`),
|
||||||
|
then prints the Qwen3-VL Judge's analysis JSON to stdout for the agent to read.
|
||||||
|
|
||||||
|
Stdlib only — no third-party deps, so any agent can shell out to it.
|
||||||
|
|
||||||
|
Loop, from the agent's side:
|
||||||
|
1. build a prompt (calibrate from the previous analysis)
|
||||||
|
2. run this script -> capture stdout (the analysis JSON)
|
||||||
|
3. read overall_score + per-axis diffs + fix_suggestions
|
||||||
|
4. adjust the prompt and go to 1, until overall_score >= target
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import urllib.error
|
||||||
|
import urllib.request
|
||||||
|
import uuid
|
||||||
|
|
||||||
|
RECEPTOR_CLASS = "CalibratorPromptReceptor"
|
||||||
|
JUDGE_CLASS = "QwenVLImageJudge"
|
||||||
|
|
||||||
|
|
||||||
|
def _http_json(url: str, payload: dict | None = None, timeout: int = 30):
|
||||||
|
data = json.dumps(payload).encode("utf-8") if payload is not None else None
|
||||||
|
req = urllib.request.Request(
|
||||||
|
url, data=data, headers={"Content-Type": "application/json"} if data else {})
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
|
body = resp.read().decode("utf-8")
|
||||||
|
return json.loads(body) if body else {}
|
||||||
|
|
||||||
|
|
||||||
|
def _inject(graph: dict, prompt: str, negative: str, seed: int, run_tag: str):
|
||||||
|
"""Set the receptor's prompt/negative/seed and the judge's run_tag in-place."""
|
||||||
|
found_receptor = False
|
||||||
|
for node in graph.values():
|
||||||
|
ctype = node.get("class_type")
|
||||||
|
inputs = node.setdefault("inputs", {})
|
||||||
|
if ctype == RECEPTOR_CLASS:
|
||||||
|
inputs["prompt"] = prompt
|
||||||
|
inputs["negative"] = negative
|
||||||
|
inputs["seed"] = int(seed)
|
||||||
|
found_receptor = True
|
||||||
|
elif ctype == JUDGE_CLASS:
|
||||||
|
inputs["run_tag"] = run_tag
|
||||||
|
inputs["prompt_used"] = prompt
|
||||||
|
if not found_receptor:
|
||||||
|
raise SystemExit(
|
||||||
|
f"[agent_bridge] no '{RECEPTOR_CLASS}' node in the workflow — add the "
|
||||||
|
f"'SxCP External Prompt (Receptor)' node and feed the sampler from it.")
|
||||||
|
|
||||||
|
|
||||||
|
def _wait_for_history(server: str, prompt_id: str, timeout: int):
|
||||||
|
deadline = time.time() + timeout
|
||||||
|
while time.time() < deadline:
|
||||||
|
hist = _http_json(f"http://{server}/history/{prompt_id}")
|
||||||
|
if prompt_id in hist:
|
||||||
|
entry = hist[prompt_id]
|
||||||
|
status = entry.get("status", {})
|
||||||
|
# ComfyUI marks completed=True (or status_str) when the run is done.
|
||||||
|
if status.get("completed", True):
|
||||||
|
return entry
|
||||||
|
time.sleep(1.0)
|
||||||
|
raise SystemExit(f"[agent_bridge] timed out after {timeout}s waiting for {prompt_id}")
|
||||||
|
|
||||||
|
|
||||||
|
def _read_report(analysis_file: str, analysis_dir: str, run_tag: str):
|
||||||
|
candidates = []
|
||||||
|
if analysis_file:
|
||||||
|
candidates.append(analysis_file)
|
||||||
|
if analysis_dir:
|
||||||
|
if run_tag:
|
||||||
|
safe = "".join(c if c.isalnum() or c in "._-" else "_" for c in run_tag)
|
||||||
|
candidates.append(os.path.join(analysis_dir, f"calib_{safe}.json"))
|
||||||
|
candidates.append(os.path.join(analysis_dir, "latest.json"))
|
||||||
|
for path in candidates:
|
||||||
|
if os.path.isfile(path):
|
||||||
|
with open(path, "r", encoding="utf-8") as f:
|
||||||
|
return json.load(f), path
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None):
|
||||||
|
ap = argparse.ArgumentParser(description="Drive one ComfyUI calibration iteration.")
|
||||||
|
ap.add_argument("--server", default="127.0.0.1:8188")
|
||||||
|
ap.add_argument("--workflow", required=True, help="API-format workflow JSON")
|
||||||
|
ap.add_argument("--prompt", required=True)
|
||||||
|
ap.add_argument("--negative", default="")
|
||||||
|
ap.add_argument("--seed", type=int, default=0)
|
||||||
|
ap.add_argument("--run-tag", default="")
|
||||||
|
ap.add_argument("--analysis-file", default="",
|
||||||
|
help="explicit path to the report JSON the Judge writes")
|
||||||
|
ap.add_argument("--analysis-dir", default="",
|
||||||
|
help="dir holding calib_<tag>.json / latest.json (Judge report_dir)")
|
||||||
|
ap.add_argument("--timeout", type=int, default=600)
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
with open(args.workflow, "r", encoding="utf-8") as f:
|
||||||
|
graph = json.load(f)
|
||||||
|
|
||||||
|
_inject(graph, args.prompt, args.negative, args.seed, args.run_tag)
|
||||||
|
|
||||||
|
client_id = uuid.uuid4().hex
|
||||||
|
try:
|
||||||
|
queued = _http_json(f"http://{args.server}/prompt",
|
||||||
|
{"prompt": graph, "client_id": client_id})
|
||||||
|
except urllib.error.URLError as e:
|
||||||
|
raise SystemExit(f"[agent_bridge] cannot reach ComfyUI at {args.server}: {e}")
|
||||||
|
prompt_id = queued.get("prompt_id")
|
||||||
|
if not prompt_id:
|
||||||
|
raise SystemExit(f"[agent_bridge] queue rejected: {json.dumps(queued)[:400]}")
|
||||||
|
|
||||||
|
_wait_for_history(args.server, prompt_id, args.timeout)
|
||||||
|
|
||||||
|
report, path = _read_report(args.analysis_file, args.analysis_dir, args.run_tag)
|
||||||
|
if report is None:
|
||||||
|
raise SystemExit(
|
||||||
|
"[agent_bridge] run finished but no report file found. Set the Judge "
|
||||||
|
"node's report_dir and pass --analysis-dir (or --analysis-file).")
|
||||||
|
|
||||||
|
report["_prompt_id"] = prompt_id
|
||||||
|
report["_report_path"] = path
|
||||||
|
json.dump(report, sys.stdout, ensure_ascii=False, indent=2)
|
||||||
|
sys.stdout.write("\n")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
@@ -0,0 +1,87 @@
|
|||||||
|
# Agent-driven calibration loop
|
||||||
|
|
||||||
|
The controller is an **external CLI agent**, not an in-graph node. ComfyUI is the
|
||||||
|
execution environment (prompt receptor → T2I → VLM judge); the agent is the brain that
|
||||||
|
reads the analysis, calibrates the prompt generator, and queues the next iteration.
|
||||||
|
|
||||||
|
```
|
||||||
|
CLI AGENT (controller / brain) COMFYUI (execution, running with --listen)
|
||||||
|
─────────────────────────────── ──────────────────────────────────────────
|
||||||
|
1. build/calibrate a prompt
|
||||||
|
2. agent_bridge.py --prompt ... ───POST /prompt──► CalibratorPromptReceptor (injection point)
|
||||||
|
│ prompt / negative / seed
|
||||||
|
▼
|
||||||
|
T2I (SDXL / Flux / Krea2)
|
||||||
|
│ generated image
|
||||||
|
▼
|
||||||
|
Qwen3-VL Image Judge
|
||||||
|
│ writes calib_<tag>.json + latest.json
|
||||||
|
3. poll /history/{id} (bridge does this) ◄───────────┘
|
||||||
|
4. read report JSON (overall_score,
|
||||||
|
per-axis diffs, fix_suggestions)
|
||||||
|
5. adjust Prompt-Builder knobs / prompt
|
||||||
|
└──► go to 1 until overall_score ≥ target
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why API-driven, not file-watch
|
||||||
|
|
||||||
|
A passive "watch a file and auto-run" receptor is fragile in ComfyUI (no native file
|
||||||
|
watcher / auto-queue, and prompt↔image↔analysis can desync). Driving `POST /prompt`
|
||||||
|
instead makes every iteration **synchronous and ordered** — one `prompt_id` ties the
|
||||||
|
prompt, the image, and the analysis together. The receptor node is still the clean
|
||||||
|
injection point; the agent just overrides its widgets per queue. (The receptor *also*
|
||||||
|
supports a `source_file` for file-first workflows if you ever want it.)
|
||||||
|
|
||||||
|
## The three pieces
|
||||||
|
|
||||||
|
| Piece | Role |
|
||||||
|
|---|---|
|
||||||
|
| `CalibratorPromptReceptor` (`SxCP External Prompt (Receptor)`) | Stable node the agent injects `prompt/negative/seed` into. Feeds the sampler. |
|
||||||
|
| `QwenVLImageJudge` (`Qwen3-VL Image Judge (Calibrator)`) | Scores generated vs reference; writes `calib_<run_tag>.json`, `latest.json`, `calib_<run_tag>.md` to `report_dir`. |
|
||||||
|
| `agent_bridge.py` | One CLI call = one iteration: inject prompt → queue → wait → print the analysis JSON to stdout. Stdlib only. |
|
||||||
|
|
||||||
|
## One iteration (what the agent runs)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python agent_bridge.py \
|
||||||
|
--server 127.0.0.1:8188 \
|
||||||
|
--workflow workflow_api.json \
|
||||||
|
--prompt "1 woman, red lingerie, bedroom, full body, warm rim light" \
|
||||||
|
--negative "blurry, deformed" \
|
||||||
|
--seed 12345 \
|
||||||
|
--run-tag iter003 \
|
||||||
|
--analysis-dir /media/p5/Comfyui/output/calibrator
|
||||||
|
```
|
||||||
|
|
||||||
|
Stdout (captured by the agent) is the report:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"run_tag": "iter003",
|
||||||
|
"overall_score": 0.62,
|
||||||
|
"axes": {
|
||||||
|
"pose": {"score": 0.40, "diff": "ref standing, gen seated"},
|
||||||
|
"clothing": {"score": 0.85, "diff": "close; gen lacks lace detail"}
|
||||||
|
},
|
||||||
|
"fix_suggestions": ["set pose=standing", "add 'lace trim' to clothing"],
|
||||||
|
"prompt_used": "1 woman, red lingerie, ...",
|
||||||
|
"_prompt_id": "…", "_report_path": "…/calib_iter003.json"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Agent calibration policy (suggested)
|
||||||
|
|
||||||
|
The agent maps the lowest-scoring axes onto Prompt-Builder knobs and applies the
|
||||||
|
`fix_suggestions`, regenerates, and keeps changes that raise `overall_score`
|
||||||
|
(greedy per-axis hill-climb). Keep the **T2I seed fixed** while searching prompt axes so
|
||||||
|
the score reflects the prompt, not sampler noise; vary the seed only once you're near the
|
||||||
|
target. Stop at `overall_score ≥ target` (e.g. 0.85) or a max-iteration budget. Log every
|
||||||
|
`(prompt, knobs, score)` so the search is auditable/resumable.
|
||||||
|
|
||||||
|
## Setup checklist
|
||||||
|
|
||||||
|
1. Run ComfyUI with `--listen` (so the bridge can POST). Install this node pack.
|
||||||
|
2. Build a workflow with: `CalibratorPromptReceptor` → (Prompt-Builder formatting, optional) → T2I → `QwenVLImageJudge` (feed the **reference** image into `reference_image`, the T2I output into `generated_image`).
|
||||||
|
3. Set the Judge's `report_dir` to a known path; pass the same path as `--analysis-dir`.
|
||||||
|
4. Export the workflow in **API format** (`workflow_api.json`).
|
||||||
|
5. Drive it from the agent with `agent_bridge.py`, once per iteration.
|
||||||
@@ -0,0 +1,135 @@
|
|||||||
|
# Calibration policy — the agent's playbook
|
||||||
|
|
||||||
|
This is the instruction set the **external CLI agent** (the controller) follows each
|
||||||
|
iteration. Paste the "Agent system prompt" block into your agent, give it the workflow
|
||||||
|
path + reference image + target score, and let it loop.
|
||||||
|
|
||||||
|
The agent calibrates by reasoning over the **Prompt‑Builder axes** and editing a
|
||||||
|
structured *axis state*, then **rendering that state to a prompt string** that it injects
|
||||||
|
into the `CalibratorPromptReceptor`. This keeps the reasoning axis‑aware while staying
|
||||||
|
compatible with the flat‑string receptor. (If you later switch the receptor to carry a
|
||||||
|
structured config, the same axis state maps straight onto Prompt‑Builder's split control
|
||||||
|
nodes.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Axis state (the agent's working memory)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"cast": "1 woman, mid-20s, athletic",
|
||||||
|
"clothing": "red lace lingerie",
|
||||||
|
"pose": "standing, hand on hip",
|
||||||
|
"scene": "dimly lit bedroom",
|
||||||
|
"composition": "full-body shot, slight low angle",
|
||||||
|
"expression": "soft smile, eye contact",
|
||||||
|
"color_light": "warm rim light, shallow depth of field",
|
||||||
|
"quality": "photorealistic, high detail",
|
||||||
|
"negative": "blurry, deformed, lowres, extra limbs",
|
||||||
|
"seed": 12345
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
These keys are exactly the Judge's scoring axes. `quality`/`negative`/`seed` are carried
|
||||||
|
but not scored. Render order (subject → wardrobe → action → setting → framing → affect →
|
||||||
|
light → quality):
|
||||||
|
|
||||||
|
```
|
||||||
|
prompt = join_nonempty([cast, clothing, pose, scene, composition, expression, color_light, quality])
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Per‑iteration algorithm (greedy per‑axis hill‑climb)
|
||||||
|
|
||||||
|
```
|
||||||
|
best_score = -1 ; best_state = initial_state ; stale = 0 ; i = 0
|
||||||
|
loop:
|
||||||
|
i += 1
|
||||||
|
prompt = render(state)
|
||||||
|
report = run agent_bridge.py --prompt prompt --negative state.negative
|
||||||
|
--seed state.seed --run-tag iter{i}
|
||||||
|
--workflow wf.json --analysis-dir <report_dir>
|
||||||
|
score = report.overall_score
|
||||||
|
if score >= TARGET: # e.g. 0.85
|
||||||
|
stop("converged", state, score)
|
||||||
|
if score > best_score:
|
||||||
|
best_score = score ; best_state = state ; stale = 0
|
||||||
|
else:
|
||||||
|
stale += 1
|
||||||
|
state = best_state # revert: undo the change that didn't help
|
||||||
|
if stale >= PATIENCE or i >= MAX_ITERS: # e.g. PATIENCE=4, MAX_ITERS=25
|
||||||
|
stop("plateau/budget", best_state, best_score)
|
||||||
|
|
||||||
|
# choose the next single edit:
|
||||||
|
worst_axis = axis with lowest per-axis score in report.axes
|
||||||
|
edit = map_fix_to_axis(report.fix_suggestions, worst_axis) # apply the model's suggestion
|
||||||
|
state = apply(best_state, worst_axis, edit) # change ONE axis only
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rules that matter
|
||||||
|
|
||||||
|
1. **Change one axis per iteration.** One edit = clean attribution of the score delta.
|
||||||
|
Only batch two edits when two axes score very low *and* are clearly independent.
|
||||||
|
2. **Freeze `seed` while searching axes.** The score must reflect the *prompt*, not
|
||||||
|
sampler noise. Vary the seed only after you've converged, to confirm robustness.
|
||||||
|
3. **Always edit from `best_state`, not the last (possibly worse) state** — that's the
|
||||||
|
"revert on no improvement" step. Prevents drifting down a bad path.
|
||||||
|
4. **Target the lowest‑scoring axis first**, applying the Judge's matching
|
||||||
|
`fix_suggestion`. If a suggestion doesn't help after a try, pick an alternative value
|
||||||
|
for that axis before moving on.
|
||||||
|
5. **Near the margin, don't over‑trust one reading.** `swap_eval` already averages two
|
||||||
|
orderings; if two candidates are within ~0.03, re‑run each on a second seed and compare
|
||||||
|
averages before committing.
|
||||||
|
6. **Detect gaming/oscillation.** If scores bounce without net gain, reduce edit size
|
||||||
|
(smaller, more specific wording changes) and re‑anchor on `best_state`.
|
||||||
|
7. **Log every step**: `(iter, axis_changed, old→new value, prompt, overall_score, per‑axis)`.
|
||||||
|
The run must be auditable and resumable.
|
||||||
|
|
||||||
|
### Mapping `fix_suggestions` → axes
|
||||||
|
|
||||||
|
The Judge phrases fixes in axis vocabulary ("set pose=standing", "add lace trim to
|
||||||
|
clothing", "warmer lighting"). Match by keyword to the axis key; if a fix is ambiguous,
|
||||||
|
attribute it to the lowest‑scoring axis it plausibly affects.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Worked example
|
||||||
|
|
||||||
|
```
|
||||||
|
iter1 prompt="1 woman, casual outfit, indoors, ..." score=0.41
|
||||||
|
axes: scene 0.30 (worst) — "ref bedroom, gen kitchen"
|
||||||
|
fix: "set scene to a dim bedroom"
|
||||||
|
iter2 edit scene→"dimly lit bedroom" score=0.58 (kept)
|
||||||
|
axes: pose 0.35 (worst) — "ref standing, gen seated"
|
||||||
|
iter3 edit pose→"standing, hand on hip" score=0.71 (kept)
|
||||||
|
axes: color_light 0.50 (worst) — "ref warm, gen flat"
|
||||||
|
iter4 edit color_light→"warm rim light" score=0.69 (worse → revert)
|
||||||
|
iter5 edit color_light→"warm golden hour glow" score=0.83 (kept)
|
||||||
|
axes: clothing 0.78 (worst) — "gen lacks lace detail"
|
||||||
|
iter6 edit clothing→"red lace lingerie with trim" score=0.88 ≥ target → STOP
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent system prompt (paste into your CLI agent)
|
||||||
|
|
||||||
|
> You are the controller for a local image prompt calibrator. Goal: make a generated
|
||||||
|
> image match a reference image, measured by a Qwen3‑VL judge that scores 7 axes
|
||||||
|
> (cast, clothing, pose, scene, composition, expression, color_light) from 0–1.
|
||||||
|
>
|
||||||
|
> You hold an **axis state** (JSON, keys above). Each turn you: (1) render the state to a
|
||||||
|
> prompt string in the order cast→clothing→pose→scene→composition→expression→color_light→
|
||||||
|
> quality; (2) run `python agent_bridge.py --workflow <wf> --prompt "<rendered>"
|
||||||
|
> --negative "<state.negative>" --seed <state.seed> --run-tag iter<N> --analysis-dir
|
||||||
|
> <report_dir>`; (3) read the printed JSON report.
|
||||||
|
>
|
||||||
|
> Then apply greedy per‑axis hill‑climb: keep the change only if `overall_score` improved,
|
||||||
|
> else revert to the best state; pick the **lowest‑scoring axis** and apply the Judge's
|
||||||
|
> matching `fix_suggestion` as a **single** edit. Keep the seed fixed while searching.
|
||||||
|
> Stop when `overall_score ≥ TARGET` (default 0.85), or after PATIENCE=4 non‑improving
|
||||||
|
> iterations, or MAX_ITERS=25. Log every step as a table and report the best prompt + score.
|
||||||
|
>
|
||||||
|
> Never change more than one axis at a time unless two axes are both very low and clearly
|
||||||
|
> independent. Never trust a single near‑margin reading — re‑run on a second seed when two
|
||||||
|
> candidates are within 0.03.
|
||||||
@@ -0,0 +1,198 @@
|
|||||||
|
# Local Prompt Calibrator — Methodology
|
||||||
|
|
||||||
|
> Goal: a **fully local** ComfyUI feedback loop where a vision‑language model (VLM)
|
||||||
|
> scores how close a *generated* image is to a *reference* image, and that score +
|
||||||
|
> a structured difference analysis is used to **calibrate the prompt‑generation
|
||||||
|
> method** ([ComfyUI‑Prompt‑Builder](../../ComfyUI-Prompt-Builder), the "SxCP" nodes)
|
||||||
|
> until the generated image matches the reference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. The loop at a glance
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────────┐
|
||||||
|
│ REFERENCE image (the target look) │
|
||||||
|
└───────────────┬──────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌────────────────────▼────────────────┐ calibration deltas
|
||||||
|
│ Prompt-Builder (SxCP) ── "method" │◄──── (axis nudges / knob
|
||||||
|
│ seeded pools + profile knobs │ overrides / seed move)
|
||||||
|
└────────────────────┬────────────────┘
|
||||||
|
│ prompt + negative
|
||||||
|
┌────────────────────▼────────────────┐
|
||||||
|
│ T2I model (SDXL / Flux / Krea2) │ ← fix the sampler seed while
|
||||||
|
└────────────────────┬────────────────┘ searching the prompt axes
|
||||||
|
│ generated image
|
||||||
|
┌────────────────────▼──────────────────────────────────┐
|
||||||
|
│ Qwen3-VL JUDGE node ── the "vllm node" │
|
||||||
|
│ in : reference + generated │
|
||||||
|
│ out: overall_score 0..1 │
|
||||||
|
│ per-axis scores (cast, clothing, pose, scene, │
|
||||||
|
│ composition, expression, color/lighting) │
|
||||||
|
│ diff_analysis (JSON: what's off + how to fix, │
|
||||||
|
│ phrased in Prompt-Builder axis vocabulary) │
|
||||||
|
└────────────────────┬──────────────────────────────────┘
|
||||||
|
│ score + diffs
|
||||||
|
┌────────────────────▼────────────────┐
|
||||||
|
│ CALIBRATOR / controller │
|
||||||
|
│ - accumulate per-axis scores │
|
||||||
|
│ - map diffs → axis adjustments │
|
||||||
|
│ - update Prompt-Builder knobs │
|
||||||
|
│ - stop when overall_score ≥ target │
|
||||||
|
│ or max iterations reached │
|
||||||
|
└──────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
The novel piece is the **Judge node**. Off‑the‑shelf Qwen‑VL nodes emit free text;
|
||||||
|
a calibrator needs a **machine‑readable score + per‑axis diffs** so the controller
|
||||||
|
can act on them. That is what `nodes/qwen_judge.py` in this repo provides.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. The VLLM node — what to reuse
|
||||||
|
|
||||||
|
You already have the model converted locally:
|
||||||
|
|
||||||
|
```
|
||||||
|
/media/p5/qwen3vl_4b_abliterated_comfy_convert/
|
||||||
|
├── hf_bf16/ ← huihui-ai Qwen3-VL-4B-Instruct **abliterated** (uncensored), bf16
|
||||||
|
└── hf_fp8/ ← same model, FP8 (≈4–5 GB, trivially fits the RTX 5090 32 GB)
|
||||||
|
```
|
||||||
|
|
||||||
|
The **abliterated** variant matters: stock Qwen3‑VL will often refuse to "describe or
|
||||||
|
analyze" adult imagery, which would break the loop. huihui‑ai removed the text‑side
|
||||||
|
refusal direction, so it scores NSFW reference/generated pairs without bailing.
|
||||||
|
|
||||||
|
### Reusable ComfyUI nodes (pick one as the plumbing base)
|
||||||
|
|
||||||
|
| Repo | Backend | Multi‑image | Local path | Notes |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| **[hardik-uppal/ComfyUI-QwenVL-MultiImage](https://github.com/hardik-uppal/ComfyUI-QwenVL-MultiImage)** | transformers | ✅ `images` + `images_batch_2/3` | needs tiny tweak | **Best base** — built for "compare these images, describe the differences"; supports FP16 / 8‑bit / 4‑bit **and pre‑quantized FP8** (matches your `hf_fp8`). |
|
||||||
|
| [IuvenisSapiens/ComfyUI_Qwen3-VL-Instruct](https://github.com/IuvenisSapiens/ComfyUI_Qwen3-VL-Instruct) | transformers | ✅ multi‑image query | HF download | Clean native Qwen3‑VL‑Instruct integration. |
|
||||||
|
| [jren712/ComfyUI-QwenVL-abliterated](https://github.com/jren712/ComfyUI-QwenVL-abliterated) | transformers | ✅ | abliterated‑oriented | Fork tuned for the abliterated weights. |
|
||||||
|
| [1038lab/ComfyUI-QwenVL](https://github.com/1038lab/ComfyUI-QwenVL) | **GGUF** (llama.cpp) | ✅ | local GGUF | Use only if you want GGUF; bf16 4B on 32 GB doesn't need it. |
|
||||||
|
|
||||||
|
**Recommendation:** don't run any of them *as‑is* for the loop — they only output text.
|
||||||
|
Instead reuse their **model‑load + `apply_chat_template` + `generate`** plumbing inside
|
||||||
|
a purpose‑built **Judge node** (this repo) that forces structured JSON output. The
|
||||||
|
`ComfyUI-QwenVL-MultiImage` loader is the closest template (it already handles two
|
||||||
|
image batches + FP8).
|
||||||
|
|
||||||
|
### Model sizing on 32 GB (RTX 5090) — abliterated, latest Qwen VL
|
||||||
|
|
||||||
|
As of June 2026 the **latest Qwen VL family is Qwen3‑VL** (Qwen3.5‑VL shipped early
|
||||||
|
2026, but abliterated builds of it are **text‑only so far** — no uncensored
|
||||||
|
Qwen3.5‑*VL* yet). So "latest + uncensored + fits 32 GB" = **Qwen3‑VL‑30B‑A3B abliterated**.
|
||||||
|
All rows below are huihui‑ai abliterated (uncensored) weights:
|
||||||
|
|
||||||
|
| Model (abliterated) | Best precision on 32 GB | ~VRAM | Verdict |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **Qwen3‑VL‑30B‑A3B‑Instruct** ([HF](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated)) | **nf4 (4‑bit)** or GGUF Q4_K_M | ~18 GB | **Best judge that fits.** MoE → only 3B active, so it's fast despite 30B total. transformers class `Qwen3VLMoeForConditionalGeneration` (auto‑detected by the node). |
|
||||||
|
| Qwen3‑VL‑8B‑Instruct ([HF](https://huggingface.co/huihui-ai)) | bf16 | ~17 GB | Easy middle ground, no quantization. Clearly better than 4B; drop‑in for the judge node. |
|
||||||
|
| Qwen3‑VL‑4B‑Instruct (already local) | fp8 / bf16 | ~5 / ~9 GB | Lightweight fallback / fast iteration. |
|
||||||
|
|
||||||
|
**Gemma alternative:** Gemma‑3‑27B‑it (abliterated, 4‑bit ~16 GB) is a solid different
|
||||||
|
visual prior if you want a second opinion, but the Krea2 text encoder + Prompt‑Builder
|
||||||
|
are already Qwen‑aligned, so staying on Qwen3‑VL keeps the vocabulary consistent.
|
||||||
|
|
||||||
|
Download an upgrade and point the node's `model_path` at it:
|
||||||
|
```bash
|
||||||
|
hf download huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated \
|
||||||
|
--local-dir /media/p5/models/Qwen3-VL-30B-A3B-abliterated
|
||||||
|
# then in the Judge node: model_path=<that dir>, precision=nf4
|
||||||
|
```
|
||||||
|
|
||||||
|
Practical note: at nf4 the 30B judge (~18 GB) and an SDXL/Flux T2I model can't always
|
||||||
|
co‑reside — run them as **separate queue steps** and let ComfyUI unload between; the loop
|
||||||
|
is sequential anyway. The 8B bf16 judge co‑resides more easily.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Scoring rubric (what the VLM actually returns)
|
||||||
|
|
||||||
|
The judge prompts Qwen3‑VL to return **strict JSON** with one overall score and a score
|
||||||
|
per axis, where the axes mirror what Prompt‑Builder can control. This is what makes the
|
||||||
|
diff *actionable* instead of generic prose.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"overall_score": 0.0,
|
||||||
|
"axes": {
|
||||||
|
"cast": {"score": 0.0, "diff": "ref has 1 woman, gen has 2"},
|
||||||
|
"clothing": {"score": 0.0, "diff": "ref lingerie vs gen nude"},
|
||||||
|
"pose": {"score": 0.0, "diff": "ref standing vs gen seated"},
|
||||||
|
"scene": {"score": 0.0, "diff": "ref bedroom vs gen outdoor"},
|
||||||
|
"composition": {"score": 0.0, "diff": "ref full body vs gen close-up"},
|
||||||
|
"expression": {"score": 0.0, "diff": "ref smiling vs gen neutral"},
|
||||||
|
"color_light": {"score": 0.0, "diff": "ref warm vs gen cool/flat"}
|
||||||
|
},
|
||||||
|
"fix_suggestions": ["reduce cast to 1 woman", "set clothing=lingerie", ...]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The axis list is **configurable** on the node so it can match whichever Prompt‑Builder
|
||||||
|
knobs you expose (cast, clothing, pose, scene/location, composition/framing, expression,
|
||||||
|
color/lighting). `fix_suggestions` is phrased in axis vocabulary so the controller can
|
||||||
|
map each one onto a knob.
|
||||||
|
|
||||||
|
### Reducing VLM‑as‑judge variance (important)
|
||||||
|
|
||||||
|
VLM scoring is noisy and biased. Mitigations baked into the node / recommended:
|
||||||
|
|
||||||
|
1. **Position‑bias swap** — run the judge twice with reference/generated order swapped and
|
||||||
|
average the per‑axis scores (`swap_eval=True`). Cuts the "first image wins" bias.
|
||||||
|
2. **Low temperature** (0.0–0.3) + a **fixed rubric** in the system prompt → repeatable scores.
|
||||||
|
3. **Anchored 0–1 rubric** (0 = unrelated, 0.5 = same category/different details, 1 = near‑identical) so scores are comparable across iterations.
|
||||||
|
4. **Evidence‑first**: ask the model to state the concrete difference *before* the number; reasoning‑then‑score is measurably more reliable than score‑then‑reasoning.
|
||||||
|
5. **Average over k T2I seeds** for the *same* prompt if you want the score to reflect the prompt rather than sampler noise — or, cheaper, **freeze the T2I seed** during the axis search and only vary it once at the end.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. The calibrator / controller
|
||||||
|
|
||||||
|
> **Chosen design: the controller is an external CLI agent, not an in‑graph node.**
|
||||||
|
> The agent reads the Judge's text/JSON analysis, calibrates the prompt, injects it into
|
||||||
|
> the `CalibratorPromptReceptor` node, and queues ComfyUI via its HTTP API — one
|
||||||
|
> `prompt_id` per iteration. See **[AGENT_LOOP.md](AGENT_LOOP.md)** and `agent_bridge.py`.
|
||||||
|
> The options below describe the *policy* the agent can run.
|
||||||
|
|
||||||
|
Prompt‑Builder is a **deterministic, seeded, combinatorial** generator (it is *not* an
|
||||||
|
LLM). So "calibration" = **searching the space of `(seed, profile, per‑axis overrides)`**
|
||||||
|
to maximize `overall_score`. Three controller options, easiest → strongest:
|
||||||
|
|
||||||
|
1. **Greedy per‑axis hill‑climb (start here).**
|
||||||
|
For each axis with the lowest score, apply the matching `fix_suggestion` as a knob
|
||||||
|
override (e.g. set `clothing=lingerie`, `cast_women=1`), regenerate, keep the change
|
||||||
|
if `overall_score` improved, else revert. Loop until ≥ target or no axis improves.
|
||||||
|
Implementable today with the Prompt‑Builder **For‑Loop Start/End + Accumulator** nodes.
|
||||||
|
|
||||||
|
2. **Black‑box optimizer over the knob vector.**
|
||||||
|
Encode the exposed knobs as a parameter vector and drive it with Optuna / CMA‑ES /
|
||||||
|
a simple bandit, objective = `overall_score`. Better for >3–4 interacting axes; needs
|
||||||
|
a thin Python controller node that holds state across iterations.
|
||||||
|
|
||||||
|
3. **LLM‑in‑the‑loop rewriter.**
|
||||||
|
Feed `diff_analysis` to a (local) text LLM that proposes the next knob settings (or,
|
||||||
|
if you move to free‑text prompts, rewrites the prompt). Most flexible, least
|
||||||
|
reproducible — use the same abliterated Qwen3 text head to keep it local and uncensored.
|
||||||
|
|
||||||
|
**Loop hygiene:** fix resolution/sampler/steps across iterations; freeze T2I seed while
|
||||||
|
searching; stop on `overall_score ≥ target` (e.g. 0.85) **or** `max_iters`; log every
|
||||||
|
`(knobs, score, diff)` triple so the search is auditable and resumable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Concrete build order
|
||||||
|
|
||||||
|
1. **Judge node** (this repo, `nodes/qwen_judge.py`) — load local Qwen3‑VL‑4B abliterated,
|
||||||
|
take ref+gen, output `overall_score (FLOAT)`, `axis_scores (JSON STRING)`,
|
||||||
|
`diff_analysis (STRING)`, `raw (STRING)`. ✅ scaffolded.
|
||||||
|
2. **Wire the loop** in a workflow: Prompt‑Builder → T2I → Judge → Accumulator, using the
|
||||||
|
SxCP For‑Loop nodes; route `overall_score` into the loop's stop condition.
|
||||||
|
3. **Controller node** — start with greedy per‑axis hill‑climb that reads `diff_analysis`
|
||||||
|
and emits knob overrides back into Prompt‑Builder's split control nodes.
|
||||||
|
4. **Tune the judge** — calibrate the rubric on a handful of known ref/gen pairs; enable
|
||||||
|
`swap_eval`; pick temperature; decide if you need to step up to 8B/30B‑A3B.
|
||||||
|
|
||||||
|
See [README.md](../README.md) for install/usage of the Judge node.
|
||||||
@@ -0,0 +1,418 @@
|
|||||||
|
"""
|
||||||
|
Qwen3-VL Image-Similarity Judge node for ComfyUI.
|
||||||
|
|
||||||
|
The "vllm node" of the Prompt Calibrator. It takes a REFERENCE image and a
|
||||||
|
GENERATED image and asks a local Qwen3-VL model how close the generated image is
|
||||||
|
to the reference, returning a machine-readable score + per-axis difference
|
||||||
|
analysis that the calibration controller can act on.
|
||||||
|
|
||||||
|
Reuses the standard transformers Qwen3-VL plumbing (the same approach used by
|
||||||
|
ComfyUI-QwenVL-MultiImage / ComfyUI_Qwen3-VL-Instruct), but forces strict JSON
|
||||||
|
output so the result is usable by an automated loop rather than a human reader.
|
||||||
|
|
||||||
|
Default model is the locally converted huihui-ai Qwen3-VL-4B-Instruct
|
||||||
|
*abliterated* (uncensored) weights, which do not refuse to analyze adult imagery.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
# Default to the model already converted on this machine (works out of the box).
|
||||||
|
DEFAULT_MODEL_PATH = "/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16"
|
||||||
|
DEFAULT_MODEL_PATH_FP8 = "/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_fp8"
|
||||||
|
|
||||||
|
# Recommended abliterated upgrades for the RTX 5090 32 GB (latest Qwen VL family).
|
||||||
|
# Download with: hf download <repo> --local-dir <dir>, then point model_path at it.
|
||||||
|
RECOMMENDED_MODELS = {
|
||||||
|
# Best judge that fits 32 GB. MoE (3B active -> fast). Use precision="nf4"
|
||||||
|
# (~18 GB) on 32 GB, or the GGUF quants via a GGUF node. transformers class:
|
||||||
|
# Qwen3VLMoeForConditionalGeneration (auto-detected below).
|
||||||
|
"30b-a3b": "huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated",
|
||||||
|
# Easy middle ground: bf16 ~17 GB, no quantization hassle, drop-in here.
|
||||||
|
"8b": "huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated",
|
||||||
|
# Lightweight, already local.
|
||||||
|
"4b": "huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated",
|
||||||
|
}
|
||||||
|
|
||||||
|
DEFAULT_AXES = "cast, clothing, pose, scene, composition, expression, color_light"
|
||||||
|
|
||||||
|
# Cache loaded (model, processor) keyed by (path, precision) so the loop does not
|
||||||
|
# reload weights every iteration.
|
||||||
|
_MODEL_CACHE: dict[tuple[str, str], tuple] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def _looks_like_repo_id(s: str) -> bool:
|
||||||
|
"""'org/name' HF repo id, not an absolute/local filesystem path."""
|
||||||
|
return ("/" in s) and (" " not in s) and (not os.path.isabs(s)) and (not s.startswith("."))
|
||||||
|
|
||||||
|
|
||||||
|
def _download_target_dir(repo_id: str) -> str:
|
||||||
|
"""Where to put downloaded weights — prefer ComfyUI's models/prompt_generator/."""
|
||||||
|
name = repo_id.split("/")[-1]
|
||||||
|
try:
|
||||||
|
import folder_paths # available when running inside ComfyUI
|
||||||
|
base = os.path.join(folder_paths.models_dir, "prompt_generator")
|
||||||
|
except Exception:
|
||||||
|
base = os.path.join(os.path.dirname(os.path.dirname(__file__)), "models")
|
||||||
|
return os.path.join(base, name)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_model_source(model_path: str, auto_download: bool) -> str:
|
||||||
|
"""Turn model_path (local dir | short alias | HF repo id) into a local dir.
|
||||||
|
|
||||||
|
Downloads from the Hub on first use if needed (and auto_download is on).
|
||||||
|
"""
|
||||||
|
# Short alias -> full repo id (e.g. "30b-a3b", "8b", "4b").
|
||||||
|
if model_path in RECOMMENDED_MODELS:
|
||||||
|
model_path = RECOMMENDED_MODELS[model_path]
|
||||||
|
|
||||||
|
if os.path.isdir(model_path):
|
||||||
|
return model_path
|
||||||
|
|
||||||
|
if _looks_like_repo_id(model_path):
|
||||||
|
target = _download_target_dir(model_path)
|
||||||
|
# Already downloaded? (a config.json is enough to trust the local copy)
|
||||||
|
if os.path.isfile(os.path.join(target, "config.json")):
|
||||||
|
return target
|
||||||
|
if not auto_download:
|
||||||
|
raise FileNotFoundError(
|
||||||
|
f"[QwenVLImageJudge] '{model_path}' is not downloaded and auto_download is off. "
|
||||||
|
f"Enable auto_download or pre-fetch it to {target}.")
|
||||||
|
from huggingface_hub import snapshot_download
|
||||||
|
print(f"[QwenVLImageJudge] downloading {model_path} -> {target} (first run only, may be large)...")
|
||||||
|
local = snapshot_download(
|
||||||
|
repo_id=model_path,
|
||||||
|
local_dir=target,
|
||||||
|
# weights + processor/tokenizer/config; skip duplicate GGUF/onnx blobs.
|
||||||
|
allow_patterns=["*.json", "*.safetensors", "*.txt", "*.model", "merges.txt", "*.py"],
|
||||||
|
)
|
||||||
|
print(f"[QwenVLImageJudge] download complete: {local}")
|
||||||
|
return local
|
||||||
|
|
||||||
|
# A local path that simply doesn't exist.
|
||||||
|
raise FileNotFoundError(
|
||||||
|
f"[QwenVLImageJudge] model_path not found: {model_path}. "
|
||||||
|
f"Use a local checkpoint dir, a HF repo id (org/name), or an alias "
|
||||||
|
f"({', '.join(RECOMMENDED_MODELS)}).")
|
||||||
|
|
||||||
|
|
||||||
|
def _tensor_to_pil(image: "torch.Tensor") -> Image.Image:
|
||||||
|
"""ComfyUI IMAGE tensor (B,H,W,C float 0..1) -> first-frame PIL.Image (RGB)."""
|
||||||
|
if image is None:
|
||||||
|
raise ValueError("Judge node received an empty image input.")
|
||||||
|
arr = image
|
||||||
|
if hasattr(arr, "detach"):
|
||||||
|
arr = arr.detach().cpu().numpy()
|
||||||
|
arr = np.asarray(arr)
|
||||||
|
if arr.ndim == 4: # batch -> take first frame
|
||||||
|
arr = arr[0]
|
||||||
|
arr = np.clip(arr * 255.0, 0, 255).astype(np.uint8)
|
||||||
|
if arr.ndim == 2:
|
||||||
|
arr = np.stack([arr] * 3, axis=-1)
|
||||||
|
if arr.shape[-1] == 4: # drop alpha
|
||||||
|
arr = arr[..., :3]
|
||||||
|
return Image.fromarray(arr, mode="RGB")
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_vl_class(model_path: str):
|
||||||
|
"""Pick the right transformers class. AutoModelForImageTextToText reads the
|
||||||
|
checkpoint's `architectures` and instantiates the correct dense
|
||||||
|
(Qwen3VLForConditionalGeneration) or MoE (Qwen3VLMoeForConditionalGeneration)
|
||||||
|
class automatically — so 4B/8B *and* 30B-A3B all work without branching."""
|
||||||
|
try:
|
||||||
|
from transformers import AutoModelForImageTextToText as _Auto
|
||||||
|
return _Auto
|
||||||
|
except ImportError: # pragma: no cover - older transformers
|
||||||
|
name = model_path.lower()
|
||||||
|
is_moe = any(t in name for t in ("a3b", "moe", "30b", "235b"))
|
||||||
|
if is_moe:
|
||||||
|
from transformers import Qwen3VLMoeForConditionalGeneration as _C
|
||||||
|
else:
|
||||||
|
from transformers import Qwen3VLForConditionalGeneration as _C
|
||||||
|
return _C
|
||||||
|
|
||||||
|
|
||||||
|
def _load_model(model_path: str, precision: str):
|
||||||
|
key = (model_path, precision)
|
||||||
|
if key in _MODEL_CACHE:
|
||||||
|
return _MODEL_CACHE[key]
|
||||||
|
|
||||||
|
# Imported lazily so the node can be registered even if transformers is old.
|
||||||
|
from transformers import AutoProcessor
|
||||||
|
|
||||||
|
_VLModel = _resolve_vl_class(model_path)
|
||||||
|
load_kwargs = dict(device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True)
|
||||||
|
|
||||||
|
if precision == "nf4":
|
||||||
|
# 4-bit (bitsandbytes) — lets the 30B-A3B abliterated MoE fit in ~18 GB on 32 GB.
|
||||||
|
from transformers import BitsAndBytesConfig
|
||||||
|
load_kwargs["quantization_config"] = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True,
|
||||||
|
bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype=torch.bfloat16,
|
||||||
|
bnb_4bit_use_double_quant=True,
|
||||||
|
)
|
||||||
|
elif precision == "fp8":
|
||||||
|
# Pre-quantized FP8 weights: let the checkpoint dictate dtype.
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
load_kwargs["dtype"] = torch.bfloat16 if precision == "bf16" else torch.float16
|
||||||
|
|
||||||
|
model = _VLModel.from_pretrained(model_path, **load_kwargs)
|
||||||
|
model.eval()
|
||||||
|
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
|
||||||
|
_MODEL_CACHE[key] = (model, processor)
|
||||||
|
return model, processor
|
||||||
|
|
||||||
|
|
||||||
|
def _build_system_prompt(axes: list[str]) -> str:
|
||||||
|
axis_lines = "\n".join(f' "{a}": {{"score": <0..1>, "diff": "<short note>"}},' for a in axes)
|
||||||
|
return (
|
||||||
|
"You are a meticulous visual-similarity judge for an image-generation "
|
||||||
|
"calibration loop. You are shown two images: IMAGE 1 is the REFERENCE "
|
||||||
|
"(the target) and IMAGE 2 is the GENERATED candidate. Judge how closely "
|
||||||
|
"the GENERATED image reproduces the REFERENCE.\n\n"
|
||||||
|
"Score each axis from 0 to 1 using this anchored rubric:\n"
|
||||||
|
" 0.0 = unrelated; 0.5 = same general category but clearly different "
|
||||||
|
"details; 1.0 = near-identical.\n"
|
||||||
|
"For each axis, FIRST note the concrete difference, THEN assign the number.\n\n"
|
||||||
|
"Reply with STRICT JSON only, no prose, no markdown fences, exactly:\n"
|
||||||
|
"{\n"
|
||||||
|
' "overall_score": <0..1>,\n'
|
||||||
|
' "axes": {\n'
|
||||||
|
f"{axis_lines}\n"
|
||||||
|
" },\n"
|
||||||
|
' "fix_suggestions": ["<actionable change to the generation prompt>", ...]\n'
|
||||||
|
"}\n"
|
||||||
|
"Phrase every diff and fix in terms of the named axes "
|
||||||
|
"(cast/clothing/pose/scene/composition/expression/color_light). "
|
||||||
|
"overall_score must be consistent with the per-axis scores."
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _run_once(model, processor, ref_pil, gen_pil, axes, max_new_tokens, temperature):
|
||||||
|
"""One forward pass; returns the raw decoded string."""
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": _build_system_prompt(axes)},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": [
|
||||||
|
{"type": "text", "text": "IMAGE 1 = REFERENCE (target):"},
|
||||||
|
{"type": "image", "image": ref_pil},
|
||||||
|
{"type": "text", "text": "IMAGE 2 = GENERATED candidate:"},
|
||||||
|
{"type": "image", "image": gen_pil},
|
||||||
|
{"type": "text", "text": "Now return the strict JSON judgement."},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
|
inputs = processor(text=[text], images=[ref_pil, gen_pil], return_tensors="pt")
|
||||||
|
inputs = inputs.to(model.device)
|
||||||
|
|
||||||
|
gen_kwargs = dict(max_new_tokens=max_new_tokens)
|
||||||
|
if temperature and temperature > 0:
|
||||||
|
gen_kwargs.update(do_sample=True, temperature=float(temperature))
|
||||||
|
else:
|
||||||
|
gen_kwargs.update(do_sample=False)
|
||||||
|
|
||||||
|
with torch.inference_mode():
|
||||||
|
out = model.generate(**inputs, **gen_kwargs)
|
||||||
|
trimmed = out[:, inputs.input_ids.shape[1]:]
|
||||||
|
decoded = processor.batch_decode(trimmed, skip_special_tokens=True)[0]
|
||||||
|
return decoded.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_json(raw: str) -> dict | None:
|
||||||
|
"""Best-effort: pull the first balanced JSON object out of the model output."""
|
||||||
|
# Strip code fences if present.
|
||||||
|
fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", raw, re.DOTALL)
|
||||||
|
candidate = fenced.group(1) if fenced else None
|
||||||
|
if candidate is None:
|
||||||
|
start = raw.find("{")
|
||||||
|
if start == -1:
|
||||||
|
return None
|
||||||
|
depth = 0
|
||||||
|
for i in range(start, len(raw)):
|
||||||
|
if raw[i] == "{":
|
||||||
|
depth += 1
|
||||||
|
elif raw[i] == "}":
|
||||||
|
depth -= 1
|
||||||
|
if depth == 0:
|
||||||
|
candidate = raw[start:i + 1]
|
||||||
|
break
|
||||||
|
if candidate is None:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return json.loads(candidate)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_swapped(a: dict, b: dict) -> dict:
|
||||||
|
"""Average two judgements (normal + order-swapped) to cut position bias."""
|
||||||
|
if not b:
|
||||||
|
return a
|
||||||
|
if not a:
|
||||||
|
return b
|
||||||
|
out = {"axes": {}, "fix_suggestions": []}
|
||||||
|
out["overall_score"] = round(
|
||||||
|
(float(a.get("overall_score", 0)) + float(b.get("overall_score", 0))) / 2.0, 4
|
||||||
|
)
|
||||||
|
axes = set(a.get("axes", {})) | set(b.get("axes", {}))
|
||||||
|
for ax in axes:
|
||||||
|
sa = a.get("axes", {}).get(ax, {})
|
||||||
|
sb = b.get("axes", {}).get(ax, {})
|
||||||
|
score = (float(sa.get("score", 0)) + float(sb.get("score", 0))) / 2.0
|
||||||
|
diff = sa.get("diff") or sb.get("diff") or ""
|
||||||
|
out["axes"][ax] = {"score": round(score, 4), "diff": diff}
|
||||||
|
out["fix_suggestions"] = (a.get("fix_suggestions") or []) + (b.get("fix_suggestions") or [])
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def _report_base_dir(report_dir: str) -> str:
|
||||||
|
if report_dir:
|
||||||
|
return report_dir
|
||||||
|
try:
|
||||||
|
import folder_paths
|
||||||
|
return os.path.join(folder_paths.get_output_directory(), "calibrator")
|
||||||
|
except Exception:
|
||||||
|
return os.path.join(os.path.dirname(os.path.dirname(__file__)), "output", "calibrator")
|
||||||
|
|
||||||
|
|
||||||
|
def _write_report(report_dir, run_tag, overall, merged, diff_analysis, raw_all, prompt_used):
|
||||||
|
"""Persist the analysis so the external CLI agent can read it after a queue.
|
||||||
|
|
||||||
|
Writes a per-run file plus a stable `latest.json` the agent can always poll.
|
||||||
|
Returns the per-run file path (or "" on failure)."""
|
||||||
|
base = _report_base_dir(report_dir)
|
||||||
|
try:
|
||||||
|
os.makedirs(base, exist_ok=True)
|
||||||
|
except OSError as e:
|
||||||
|
print(f"[QwenVLImageJudge] could not create report dir {base}: {e}")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"run_tag": run_tag,
|
||||||
|
"overall_score": round(float(overall), 4),
|
||||||
|
"axes": (merged or {}).get("axes", {}),
|
||||||
|
"fix_suggestions": (merged or {}).get("fix_suggestions", []),
|
||||||
|
"diff_analysis": diff_analysis,
|
||||||
|
"prompt_used": prompt_used,
|
||||||
|
"raw": raw_all,
|
||||||
|
}
|
||||||
|
tag = re.sub(r"[^A-Za-z0-9._-]", "_", run_tag) if run_tag else "latest"
|
||||||
|
run_path = os.path.join(base, f"calib_{tag}.json")
|
||||||
|
for path in (run_path, os.path.join(base, "latest.json")):
|
||||||
|
try:
|
||||||
|
with open(path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(payload, f, ensure_ascii=False, indent=2)
|
||||||
|
except OSError as e:
|
||||||
|
print(f"[QwenVLImageJudge] failed writing report {path}: {e}")
|
||||||
|
# A markdown sibling is handy for the agent to read as plain text.
|
||||||
|
try:
|
||||||
|
md = (f"# Calibration analysis ({tag})\n\n"
|
||||||
|
f"**overall_score:** {payload['overall_score']}\n\n"
|
||||||
|
f"**prompt_used:**\n\n{prompt_used or '(not provided)'}\n\n"
|
||||||
|
f"## per-axis\n\n{diff_analysis}\n")
|
||||||
|
with open(os.path.join(base, f"calib_{tag}.md"), "w", encoding="utf-8") as f:
|
||||||
|
f.write(md)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
return run_path
|
||||||
|
|
||||||
|
|
||||||
|
class QwenVLImageJudge:
|
||||||
|
"""ComfyUI node: score how close a generated image is to a reference."""
|
||||||
|
|
||||||
|
CATEGORY = "prompt_calibrator"
|
||||||
|
FUNCTION = "judge"
|
||||||
|
RETURN_TYPES = ("FLOAT", "STRING", "STRING", "STRING", "STRING")
|
||||||
|
RETURN_NAMES = ("overall_score", "axis_scores_json", "diff_analysis", "raw", "report_path")
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def INPUT_TYPES(cls):
|
||||||
|
return {
|
||||||
|
"required": {
|
||||||
|
"reference_image": ("IMAGE",),
|
||||||
|
"generated_image": ("IMAGE",),
|
||||||
|
"model_path": ("STRING", {"default": DEFAULT_MODEL_PATH}),
|
||||||
|
"precision": (["bf16", "fp16", "fp8", "nf4"], {"default": "bf16"}),
|
||||||
|
"axes": ("STRING", {"default": DEFAULT_AXES, "multiline": True}),
|
||||||
|
"max_new_tokens": ("INT", {"default": 512, "min": 64, "max": 4096}),
|
||||||
|
"temperature": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.5, "step": 0.05}),
|
||||||
|
"swap_eval": ("BOOLEAN", {"default": True}),
|
||||||
|
},
|
||||||
|
"optional": {
|
||||||
|
"keep_loaded": ("BOOLEAN", {"default": True}),
|
||||||
|
"auto_download": ("BOOLEAN", {"default": True}),
|
||||||
|
# The agent reads the analysis from these files after each queue.
|
||||||
|
"report_dir": ("STRING", {"default": ""}),
|
||||||
|
"run_tag": ("STRING", {"default": ""}),
|
||||||
|
"prompt_used": ("STRING", {"default": "", "multiline": True}),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
def judge(self, reference_image, generated_image, model_path, precision, axes,
|
||||||
|
max_new_tokens, temperature, swap_eval, keep_loaded=True, auto_download=True,
|
||||||
|
report_dir="", run_tag="", prompt_used=""):
|
||||||
|
axis_list = [a.strip() for a in re.split(r"[,\n]", axes) if a.strip()]
|
||||||
|
if not axis_list:
|
||||||
|
axis_list = [a.strip() for a in DEFAULT_AXES.split(",")]
|
||||||
|
|
||||||
|
try:
|
||||||
|
resolved_path = _resolve_model_source(model_path, auto_download)
|
||||||
|
except Exception as e: # missing model / download failure -> surface as score 0
|
||||||
|
msg = str(e)
|
||||||
|
print(msg)
|
||||||
|
return (0.0, "{}", msg, msg)
|
||||||
|
|
||||||
|
ref_pil = _tensor_to_pil(reference_image)
|
||||||
|
gen_pil = _tensor_to_pil(generated_image)
|
||||||
|
|
||||||
|
model, processor = _load_model(resolved_path, precision)
|
||||||
|
|
||||||
|
raw1 = _run_once(model, processor, ref_pil, gen_pil, axis_list, max_new_tokens, temperature)
|
||||||
|
parsed1 = _parse_json(raw1) or {}
|
||||||
|
|
||||||
|
raw_all = raw1
|
||||||
|
merged = parsed1
|
||||||
|
if swap_eval:
|
||||||
|
# Swap which image is called REFERENCE to average out position bias.
|
||||||
|
raw2 = _run_once(model, processor, gen_pil, ref_pil, axis_list, max_new_tokens, temperature)
|
||||||
|
parsed2 = _parse_json(raw2) or {}
|
||||||
|
merged = _merge_swapped(parsed1, parsed2)
|
||||||
|
raw_all = raw1 + "\n--- SWAPPED ---\n" + raw2
|
||||||
|
|
||||||
|
if not keep_loaded:
|
||||||
|
_MODEL_CACHE.pop((resolved_path, precision), None)
|
||||||
|
del model
|
||||||
|
torch.cuda.empty_cache()
|
||||||
|
|
||||||
|
overall = float(merged.get("overall_score", 0.0)) if merged else 0.0
|
||||||
|
axis_scores = json.dumps(merged.get("axes", {}), ensure_ascii=False, indent=2) if merged else "{}"
|
||||||
|
|
||||||
|
# Human/controller-readable diff summary.
|
||||||
|
diff_lines = []
|
||||||
|
for ax, info in (merged.get("axes", {}) if merged else {}).items():
|
||||||
|
diff_lines.append(f"- {ax}: {info.get('score', 0):.2f} — {info.get('diff', '')}")
|
||||||
|
fixes = merged.get("fix_suggestions", []) if merged else []
|
||||||
|
if fixes:
|
||||||
|
diff_lines.append("fixes: " + "; ".join(str(f) for f in fixes))
|
||||||
|
diff_analysis = "\n".join(diff_lines) if diff_lines else "(no parseable judgement)"
|
||||||
|
|
||||||
|
report_path = _write_report(
|
||||||
|
report_dir, run_tag, overall, merged, diff_analysis, raw_all, prompt_used)
|
||||||
|
|
||||||
|
return (round(overall, 4), axis_scores, diff_analysis, raw_all, report_path)
|
||||||
|
|
||||||
|
|
||||||
|
NODE_CLASS_MAPPINGS = {"QwenVLImageJudge": QwenVLImageJudge}
|
||||||
|
NODE_DISPLAY_NAME_MAPPINGS = {"QwenVLImageJudge": "Qwen3-VL Image Judge (Calibrator)"}
|
||||||
@@ -0,0 +1,66 @@
|
|||||||
|
"""
|
||||||
|
Calibrator Prompt Receptor node.
|
||||||
|
|
||||||
|
The injection point for the external CLI-agent controller. The agent overrides
|
||||||
|
this node's widget values per queue via the ComfyUI HTTP API (`POST /prompt`,
|
||||||
|
override by node id), or — as a fallback — points `source_file` at a JSON file
|
||||||
|
the agent writes. Its outputs feed the T2I sampler in place of a static prompt.
|
||||||
|
|
||||||
|
This is the "receptor in ComfyUI" in the loop:
|
||||||
|
agent -> (sets prompt here) -> T2I -> Qwen3-VL Judge -> analysis -> agent
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
class CalibratorPromptReceptor:
|
||||||
|
CATEGORY = "prompt_calibrator"
|
||||||
|
FUNCTION = "emit"
|
||||||
|
RETURN_TYPES = ("STRING", "STRING", "INT")
|
||||||
|
RETURN_NAMES = ("prompt", "negative", "seed")
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def INPUT_TYPES(cls):
|
||||||
|
return {
|
||||||
|
"required": {
|
||||||
|
"prompt": ("STRING", {"default": "", "multiline": True}),
|
||||||
|
"negative": ("STRING", {"default": "", "multiline": True}),
|
||||||
|
"seed": ("INT", {"default": 0, "min": 0, "max": 0x7FFFFFFFFFFFFFFF}),
|
||||||
|
},
|
||||||
|
"optional": {
|
||||||
|
# If set and present, a JSON file {prompt, negative, seed} overrides
|
||||||
|
# the widgets above. Lets the agent drive the loop file-first if it
|
||||||
|
# prefers that to the HTTP API.
|
||||||
|
"source_file": ("STRING", {"default": ""}),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def IS_CHANGED(cls, prompt, negative, seed, source_file=""):
|
||||||
|
# Re-run whenever the effective inputs change: widget values (API override)
|
||||||
|
# OR the source file's mtime (file-driven mode).
|
||||||
|
mtime = ""
|
||||||
|
if source_file and os.path.isfile(source_file):
|
||||||
|
mtime = str(os.path.getmtime(source_file))
|
||||||
|
return f"{prompt}|{negative}|{seed}|{source_file}|{mtime}"
|
||||||
|
|
||||||
|
def emit(self, prompt, negative, seed, source_file=""):
|
||||||
|
if source_file and os.path.isfile(source_file):
|
||||||
|
try:
|
||||||
|
with open(source_file, "r", encoding="utf-8") as f:
|
||||||
|
data = json.load(f)
|
||||||
|
prompt = data.get("prompt", prompt)
|
||||||
|
negative = data.get("negative", negative)
|
||||||
|
seed = int(data.get("seed", seed))
|
||||||
|
except (OSError, ValueError, json.JSONDecodeError) as e:
|
||||||
|
print(f"[CalibratorPromptReceptor] could not read {source_file}: {e}")
|
||||||
|
return (prompt, negative, int(seed))
|
||||||
|
|
||||||
|
|
||||||
|
NODE_CLASS_MAPPINGS = {"CalibratorPromptReceptor": CalibratorPromptReceptor}
|
||||||
|
NODE_DISPLAY_NAME_MAPPINGS = {
|
||||||
|
"CalibratorPromptReceptor": "SxCP External Prompt (Receptor)"
|
||||||
|
}
|
||||||
@@ -0,0 +1,19 @@
|
|||||||
|
[project]
|
||||||
|
name = "comfyui-prompt-calibratror"
|
||||||
|
description = "VLM-as-judge prompt calibration loop: Qwen3-VL scores generated vs reference images to calibrate the prompt-generation method."
|
||||||
|
version = "0.1.0"
|
||||||
|
license = { text = "MIT" }
|
||||||
|
requires-python = ">=3.10"
|
||||||
|
dependencies = [
|
||||||
|
"transformers>=4.57.0",
|
||||||
|
"pillow",
|
||||||
|
"numpy",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.urls]
|
||||||
|
Repository = "https://github.com/ethanfel/ComfyUI-Prompt-Calibratror"
|
||||||
|
|
||||||
|
[tool.comfy]
|
||||||
|
PublisherId = "ethanfel"
|
||||||
|
DisplayName = "ComfyUI-Prompt-Calibratror"
|
||||||
|
Icon = ""
|
||||||
@@ -0,0 +1,10 @@
|
|||||||
|
# Qwen3-VL needs transformers >= 4.57 (the version the local checkpoint was saved with).
|
||||||
|
transformers>=4.57.0
|
||||||
|
huggingface_hub # auto-download of models by repo id / alias
|
||||||
|
torch
|
||||||
|
pillow
|
||||||
|
numpy
|
||||||
|
# for precision=nf4 (4-bit) — needed to run the 30B-A3B abliterated judge on 32 GB:
|
||||||
|
bitsandbytes
|
||||||
|
# optional, for faster attention on the RTX 5090:
|
||||||
|
# flash-attn
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
{
|
||||||
|
"4": {
|
||||||
|
"class_type": "CheckpointLoaderSimple",
|
||||||
|
"inputs": { "ckpt_name": "waiIllustriousSDXL_v160.safetensors" },
|
||||||
|
"_meta": { "title": "Load Checkpoint (swap for your T2I)" }
|
||||||
|
},
|
||||||
|
"10": {
|
||||||
|
"class_type": "CalibratorPromptReceptor",
|
||||||
|
"inputs": {
|
||||||
|
"prompt": "a photo of a woman, casual outfit, indoors",
|
||||||
|
"negative": "blurry, deformed, lowres, extra limbs",
|
||||||
|
"seed": 12345,
|
||||||
|
"source_file": ""
|
||||||
|
},
|
||||||
|
"_meta": { "title": "SxCP External Prompt (Receptor)" }
|
||||||
|
},
|
||||||
|
"6": {
|
||||||
|
"class_type": "CLIPTextEncode",
|
||||||
|
"inputs": { "text": ["10", 0], "clip": ["4", 1] },
|
||||||
|
"_meta": { "title": "Positive (from receptor)" }
|
||||||
|
},
|
||||||
|
"7": {
|
||||||
|
"class_type": "CLIPTextEncode",
|
||||||
|
"inputs": { "text": ["10", 1], "clip": ["4", 1] },
|
||||||
|
"_meta": { "title": "Negative (from receptor)" }
|
||||||
|
},
|
||||||
|
"5": {
|
||||||
|
"class_type": "EmptyLatentImage",
|
||||||
|
"inputs": { "width": 1024, "height": 1024, "batch_size": 1 },
|
||||||
|
"_meta": { "title": "Empty Latent" }
|
||||||
|
},
|
||||||
|
"3": {
|
||||||
|
"class_type": "KSampler",
|
||||||
|
"inputs": {
|
||||||
|
"model": ["4", 0],
|
||||||
|
"positive": ["6", 0],
|
||||||
|
"negative": ["7", 0],
|
||||||
|
"latent_image": ["5", 0],
|
||||||
|
"seed": ["10", 2],
|
||||||
|
"steps": 28,
|
||||||
|
"cfg": 5.5,
|
||||||
|
"sampler_name": "euler",
|
||||||
|
"scheduler": "normal",
|
||||||
|
"denoise": 1.0
|
||||||
|
},
|
||||||
|
"_meta": { "title": "KSampler (seed from receptor)" }
|
||||||
|
},
|
||||||
|
"8": {
|
||||||
|
"class_type": "VAEDecode",
|
||||||
|
"inputs": { "samples": ["3", 0], "vae": ["4", 2] },
|
||||||
|
"_meta": { "title": "VAE Decode" }
|
||||||
|
},
|
||||||
|
"9": {
|
||||||
|
"class_type": "SaveImage",
|
||||||
|
"inputs": { "images": ["8", 0], "filename_prefix": "calibrator/gen" },
|
||||||
|
"_meta": { "title": "Save Generated" }
|
||||||
|
},
|
||||||
|
"11": {
|
||||||
|
"class_type": "LoadImage",
|
||||||
|
"inputs": { "image": "reference.png" },
|
||||||
|
"_meta": { "title": "Reference Image (put in ComfyUI/input/)" }
|
||||||
|
},
|
||||||
|
"12": {
|
||||||
|
"class_type": "QwenVLImageJudge",
|
||||||
|
"inputs": {
|
||||||
|
"reference_image": ["11", 0],
|
||||||
|
"generated_image": ["8", 0],
|
||||||
|
"model_path": "/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16",
|
||||||
|
"precision": "bf16",
|
||||||
|
"axes": "cast, clothing, pose, scene, composition, expression, color_light",
|
||||||
|
"max_new_tokens": 512,
|
||||||
|
"temperature": 0.0,
|
||||||
|
"swap_eval": true,
|
||||||
|
"keep_loaded": true,
|
||||||
|
"auto_download": true,
|
||||||
|
"report_dir": "/media/p5/Comfyui/output/calibrator",
|
||||||
|
"run_tag": "",
|
||||||
|
"prompt_used": ""
|
||||||
|
},
|
||||||
|
"_meta": { "title": "Qwen3-VL Image Judge (Calibrator)" }
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user