Add Qwen3.5/3.6 abliterated (safetensors) + split model/quant selectors
No GGUF needed: huihui ships Qwen3.5-9B, Qwen3.6-27B, Qwen3.6-35B-A3B as multimodal SAFETENSORS (abliterated), loadable via transformers AutoModelForMultimodalLM. Added them to the model dropdown. _resolve_vl_classes now tries AutoModelForMultimodalLM (3.5/3.6) and AutoModelForImageTextToText (Qwen3-VL) in name-based order, with load fallback across candidates. model_select is now the model NAME only; precision is the separate quant dropdown applied to it (repo_by_precision routes e.g. the local fp8 dir). Aliases 3.5-9b/3.6-27b/3.6-35b. VRAM-by-quant table in README. Needs a recent transformers for 3.5/3.6. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -34,9 +34,9 @@ can act on it.
|
|||||||
| `mode` | compare / describe | compare | `describe` = first pass over the reference only → caption + target spec (seeds the prompt). `compare` = score ref vs generated |
|
| `mode` | compare / describe | compare | `describe` = first pass over the reference only → caption + target spec (seeds the prompt). `compare` = score ref vs generated |
|
||||||
| `profile` | general / oral / penetration / handjob / solo | general | **analysis profile** — act-specialized axis set; the act-critical axes are distance/proximity-aware (e.g. `mouth_genital_distance`) so magnitude isn't hidden behind a coarse label |
|
| `profile` | general / oral / penetration / handjob / solo | general | **analysis profile** — act-specialized axis set; the act-critical axes are distance/proximity-aware (e.g. `mouth_genital_distance`) so magnitude isn't hidden behind a coarse label |
|
||||||
| `generated_image` | IMAGE (optional) | — | the candidate to score (required for `compare`, ignored for `describe`) |
|
| `generated_image` | IMAGE (optional) | — | the candidate to score (required for `compare`, ignored for `describe`) |
|
||||||
| `model_select` | dropdown | 4B local bf16 | curated **transformers** judges with VRAM hints: 4B local (bf16 ~9GB / fp8 ~5GB), 8B (bf16 ~17GB), 30B-A3B (nf4 ~18GB, slow). Auto-downloads on first use |
|
| `model_select` | dropdown (model name) | 4B local | **which judge** (transformers/safetensors, auto-downloaded): Qwen3-VL 4B/8B/30B-A3B, **Qwen3.5-9B**, **Qwen3.6-27B/35B-A3B** (newer, natively multimodal). Param size shown in the label |
|
||||||
| `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`30b-a3b`/`8b`/`4b`). Empty = use `model_select` |
|
| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | **the quant** — applies to the selected model (VRAM table below) |
|
||||||
| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | applies to a manual `model_path`; presets carry their own precision |
|
| `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`8b`/`30b-a3b`/`3.5-9b`/`3.6-27b`/`3.6-35b`). Empty = use `model_select` |
|
||||||
| `axes` | STRING | "" (empty) | **override** the profile's axis set with a custom comma/newline list; empty = use `profile` |
|
| `axes` | STRING | "" (empty) | **override** the profile's axis set with a custom comma/newline list; empty = use `profile` |
|
||||||
| `max_new_tokens` | INT | 1024 | |
|
| `max_new_tokens` | INT | 1024 | |
|
||||||
| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
|
| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
|
||||||
@@ -72,18 +72,25 @@ converted at `/media/p5/qwen3vl_4b_abliterated_comfy_convert/` so it runs out of
|
|||||||
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
|
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
|
||||||
otherwise break the loop).
|
otherwise break the loop).
|
||||||
|
|
||||||
**Pick the judge from `model_select`** (transformers / safetensors, auto-downloaded):
|
**Pick a model in `model_select` and a quant in `precision`.** All are abliterated,
|
||||||
the **8B** abliterated at `bf16` (~17 GB) is the recommended intermediate — fast and
|
multimodal **safetensors** (transformers), auto-downloaded. The newer **Qwen3.5/3.6** are
|
||||||
clearly better than the 4B. The **30B-A3B** (nf4 ~18 GB) is higher quality but slow (the
|
natively multimodal (need a recent transformers — they load via `AutoModelForMultimodalLM`).
|
||||||
`nf4`/bitsandbytes path is the bottleneck, not the model). `model_path` overrides the
|
|
||||||
dropdown for anything else.
|
|
||||||
|
|
||||||
**GGUF-only models are not in the dropdown.** Newer uncensored builds like
|
VRAM by quant on the RTX 5090 32 GB (✅ fits / ⚠ tight / ❌):
|
||||||
[`HauhauCS/Qwen3.5-9B-Uncensored`](https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive)
|
|
||||||
and [`HauhauCS/Qwen3.6-35B-A3B-Uncensored`](https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive)
|
| model | bf16 | fp8 | nf4 | note |
|
||||||
ship as **GGUF + mmproj** only — this node is transformers-only, so run those in a dedicated
|
|---|---|---|---|---|
|
||||||
GGUF node ([1038lab/ComfyUI-QwenVL](https://github.com/1038lab/ComfyUI-QwenVL) or
|
| Qwen3-VL-4B (local) | ✅ ~9 | ✅ ~5 | ✅ ~3 | fast, weak |
|
||||||
KLL535 Simple-Qwen3-VL-gguf) and feed their text output into the loop. See
|
| Qwen3-VL-8B | ✅ ~17 | ✅ ~9 | ✅ ~6 | solid, fast |
|
||||||
|
| **Qwen3.5-9B** | ✅ ~20 | ✅ ~10 | ✅ ~7 | **newer, fast — recommended** |
|
||||||
|
| Qwen3-VL-30B-A3B (MoE) | ❌ ~62 | ⚠ ~31 | ✅ ~18 | nf4 slow |
|
||||||
|
| Qwen3.6-27B (dense) | ❌ ~56 | ⚠ ~28 | ✅ ~16 | nf4 slow, strong |
|
||||||
|
| Qwen3.6-35B-A3B (MoE) | ❌ ~70 | ❌ | ✅ ~20 | nf4 slow, top quality |
|
||||||
|
|
||||||
|
`nf4` (bitsandbytes) fits the big ones but is **slow** (dequant overhead) — that's the
|
||||||
|
bottleneck, not the model. `fp8` is fast but only when a real fp8 checkpoint exists (the
|
||||||
|
local 4B has one; `precision=fp8` on a bf16-only repo won't quantize). For speed + recency,
|
||||||
|
**Qwen3.5-9B at bf16** is the sweet spot. See
|
||||||
[docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl).
|
[docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl).
|
||||||
|
|
||||||
## Loop sketch
|
## Loop sketch
|
||||||
|
|||||||
+64
-35
@@ -39,23 +39,36 @@ RECOMMENDED_MODELS = {
|
|||||||
"8b": "huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated",
|
"8b": "huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated",
|
||||||
# Lightweight, already local.
|
# Lightweight, already local.
|
||||||
"4b": "huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated",
|
"4b": "huihui-ai/Huihui-Qwen3-VL-4B-Instruct-abliterated",
|
||||||
|
# Newer natively-multimodal Qwen3.5/3.6 abliterated (need a recent transformers).
|
||||||
|
"3.5-9b": "huihui-ai/Huihui-Qwen3.5-9B-abliterated", # dense 10B, fast, newer
|
||||||
|
"3.6-27b": "huihui-ai/Huihui-Qwen3.6-27B-abliterated", # dense 28B, strong (nf4)
|
||||||
|
"3.6-35b": "huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated", # MoE, top (nf4)
|
||||||
}
|
}
|
||||||
|
|
||||||
# Curated model dropdown (label shown in the node -> how to load it). The label
|
# Curated model dropdown (label shown in the node -> how to load it). The label
|
||||||
# carries the suggested VRAM. All entries are safetensors loaded via transformers
|
# carries the suggested VRAM. ALL entries are multimodal safetensors loaded via
|
||||||
# (auto-downloaded with snapshot_download). `model_path` (manual) overrides this.
|
# transformers (auto-downloaded). The Qwen3.5/3.6 entries are natively-multimodal and
|
||||||
# GGUF-only models (e.g. HauhauCS Qwen3.5/3.6 Uncensored) are NOT listed — run those
|
# need a recent transformers (AutoModelForMultimodalLM). `model_path` overrides this.
|
||||||
# in a dedicated GGUF node (1038lab/ComfyUI-QwenVL, KLL535 Simple-Qwen3-VL-gguf).
|
# GGUF-only models still need a dedicated GGUF node — not run here (transformers only).
|
||||||
|
# model_select picks the MODEL (name only); `precision` is the separate quant dropdown.
|
||||||
|
# VRAM ≈ params × bytes/param: bf16 ≈ 2×, fp8 ≈ 1×, nf4 ≈ 0.6× (GB ≈ params·0.6). So on
|
||||||
|
# 32 GB: 8-10B fits bf16; 27-35B need nf4 (or fp8 if an fp8 checkpoint). `repo_by_precision`
|
||||||
|
# routes precisions to different checkpoints (the local 4B has separate bf16/fp8 dirs).
|
||||||
MANUAL_CHOICE = "(manual — use model_path below)"
|
MANUAL_CHOICE = "(manual — use model_path below)"
|
||||||
MODEL_PRESETS = {
|
MODEL_PRESETS = {
|
||||||
"Qwen3-VL-4B abliterated (huihui) · local bf16 ~9GB": {
|
"Qwen3-VL-4B abliterated (huihui, local) · 4B": {
|
||||||
"repo": DEFAULT_MODEL_PATH, "backend": "transformers", "precision": "bf16"},
|
"repo": DEFAULT_MODEL_PATH,
|
||||||
"Qwen3-VL-4B abliterated (huihui) · local fp8 ~5GB": {
|
"repo_by_precision": {"fp8": DEFAULT_MODEL_PATH_FP8}},
|
||||||
"repo": DEFAULT_MODEL_PATH_FP8, "backend": "transformers", "precision": "fp8"},
|
"Qwen3-VL-8B abliterated (huihui) · 8B": {
|
||||||
"Qwen3-VL-8B abliterated (huihui) · bf16 ~17GB": {
|
"repo": "huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated"},
|
||||||
"repo": "huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated", "backend": "transformers", "precision": "bf16"},
|
"Qwen3.5-9B abliterated (huihui) · 10B dense · newer": {
|
||||||
"Qwen3-VL-30B-A3B abliterated (huihui) · nf4 ~18GB (slow)": {
|
"repo": "huihui-ai/Huihui-Qwen3.5-9B-abliterated"},
|
||||||
"repo": "huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated", "backend": "transformers", "precision": "nf4"},
|
"Qwen3-VL-30B-A3B abliterated (huihui) · 30B MoE": {
|
||||||
|
"repo": "huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated"},
|
||||||
|
"Qwen3.6-27B abliterated (huihui) · 28B dense": {
|
||||||
|
"repo": "huihui-ai/Huihui-Qwen3.6-27B-abliterated"},
|
||||||
|
"Qwen3.6-35B-A3B abliterated (huihui) · 35B MoE · top": {
|
||||||
|
"repo": "huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated"},
|
||||||
}
|
}
|
||||||
|
|
||||||
# Difference axes + a one-line definition each. Definitions are injected into the
|
# Difference axes + a one-line definition each. Definitions are injected into the
|
||||||
@@ -231,22 +244,24 @@ def _tensor_to_pil(image: "torch.Tensor") -> Image.Image:
|
|||||||
return Image.fromarray(arr, mode="RGB")
|
return Image.fromarray(arr, mode="RGB")
|
||||||
|
|
||||||
|
|
||||||
def _resolve_vl_class(model_path: str):
|
def _resolve_vl_classes(model_path: str):
|
||||||
"""Pick the right transformers class. AutoModelForImageTextToText reads the
|
"""Ordered list of candidate transformers auto classes to try. Qwen3-VL
|
||||||
checkpoint's `architectures` and instantiates the correct dense
|
(4B/8B/30B) loads via AutoModelForImageTextToText; the newer natively-multimodal
|
||||||
(Qwen3VLForConditionalGeneration) or MoE (Qwen3VLMoeForConditionalGeneration)
|
Qwen3.5/3.6 load via AutoModelForMultimodalLM. The two autos have separate
|
||||||
class automatically — so 4B/8B *and* 30B-A3B all work without branching."""
|
registries, so we try the one most likely for this model first (by name) and
|
||||||
try:
|
fall back to the other, then to explicit Qwen3-VL classes on old transformers."""
|
||||||
from transformers import AutoModelForImageTextToText as _Auto
|
import transformers
|
||||||
return _Auto
|
name = model_path.lower()
|
||||||
except ImportError: # pragma: no cover - older transformers
|
new_mm = any(t in name for t in ("3.5", "3.6", "qwen3_5", "qwen3_6", "qwen3.5", "qwen3.6"))
|
||||||
name = model_path.lower()
|
order = (["AutoModelForMultimodalLM", "AutoModelForImageTextToText"] if new_mm
|
||||||
is_moe = any(t in name for t in ("a3b", "moe", "30b", "235b"))
|
else ["AutoModelForImageTextToText", "AutoModelForMultimodalLM"])
|
||||||
if is_moe:
|
classes = [getattr(transformers, n) for n in order if getattr(transformers, n, None)]
|
||||||
from transformers import Qwen3VLMoeForConditionalGeneration as _C
|
is_moe = any(t in name for t in ("a3b", "moe", "30b", "235b"))
|
||||||
else:
|
for n in (("Qwen3VLMoeForConditionalGeneration",) if is_moe else ("Qwen3VLForConditionalGeneration",)):
|
||||||
from transformers import Qwen3VLForConditionalGeneration as _C
|
c = getattr(transformers, n, None)
|
||||||
return _C
|
if c:
|
||||||
|
classes.append(c)
|
||||||
|
return classes
|
||||||
|
|
||||||
|
|
||||||
def _load_model(model_path: str, precision: str):
|
def _load_model(model_path: str, precision: str):
|
||||||
@@ -257,7 +272,7 @@ def _load_model(model_path: str, precision: str):
|
|||||||
# Imported lazily so the node can be registered even if transformers is old.
|
# Imported lazily so the node can be registered even if transformers is old.
|
||||||
from transformers import AutoProcessor
|
from transformers import AutoProcessor
|
||||||
|
|
||||||
_VLModel = _resolve_vl_class(model_path)
|
candidates = _resolve_vl_classes(model_path)
|
||||||
load_kwargs = dict(device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True)
|
load_kwargs = dict(device_map="auto", trust_remote_code=True, low_cpu_mem_usage=True)
|
||||||
|
|
||||||
if precision == "nf4":
|
if precision == "nf4":
|
||||||
@@ -275,7 +290,19 @@ def _load_model(model_path: str, precision: str):
|
|||||||
else:
|
else:
|
||||||
load_kwargs["dtype"] = torch.bfloat16 if precision == "bf16" else torch.float16
|
load_kwargs["dtype"] = torch.bfloat16 if precision == "bf16" else torch.float16
|
||||||
|
|
||||||
model = _VLModel.from_pretrained(model_path, **load_kwargs)
|
model, last_err = None, None
|
||||||
|
for cls in candidates:
|
||||||
|
try:
|
||||||
|
model = cls.from_pretrained(model_path, **load_kwargs)
|
||||||
|
break
|
||||||
|
except Exception as e: # arch not in this auto class's registry -> try the next
|
||||||
|
last_err = e
|
||||||
|
model = None
|
||||||
|
if model is None:
|
||||||
|
raise RuntimeError(
|
||||||
|
f"[QwenVLImageJudge] could not load {model_path} with any of "
|
||||||
|
f"{[c.__name__ for c in candidates]}. Newer Qwen3.5/3.6 need a recent "
|
||||||
|
f"transformers (AutoModelForMultimodalLM). Last error: {last_err}")
|
||||||
model.eval()
|
model.eval()
|
||||||
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
|
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
|
||||||
_ensure_chat_template(processor, model_path)
|
_ensure_chat_template(processor, model_path)
|
||||||
@@ -716,18 +743,20 @@ class QwenVLImageJudge:
|
|||||||
if not axis_list:
|
if not axis_list:
|
||||||
axis_list = list(PROFILES.get(profile, PROFILES["general"]))
|
axis_list = list(PROFILES.get(profile, PROFILES["general"]))
|
||||||
|
|
||||||
# Resolve the model: manual model_path overrides the dropdown preset.
|
# Resolve the model: manual model_path overrides the dropdown. `precision` is the
|
||||||
|
# quant dropdown and applies to whichever model is chosen.
|
||||||
|
eff_precision = precision
|
||||||
if model_path.strip():
|
if model_path.strip():
|
||||||
eff_repo, eff_precision = model_path.strip(), precision
|
eff_repo = model_path.strip()
|
||||||
eff_backend = "gguf" if eff_repo.lower().endswith(".gguf") else "transformers"
|
|
||||||
else:
|
else:
|
||||||
preset = MODEL_PRESETS.get(model_select)
|
preset = MODEL_PRESETS.get(model_select)
|
||||||
if not preset:
|
if not preset:
|
||||||
msg = "[QwenVLImageJudge] pick a model in model_select, or fill model_path."
|
msg = "[QwenVLImageJudge] pick a model in model_select, or fill model_path."
|
||||||
print(msg); return (0.0, "{}", msg, msg, "")
|
print(msg); return (0.0, "{}", msg, msg, "")
|
||||||
eff_repo, eff_backend, eff_precision = preset["repo"], preset["backend"], preset["precision"]
|
# repo_by_precision routes a quant to a different checkpoint (e.g. local fp8 dir).
|
||||||
|
eff_repo = preset.get("repo_by_precision", {}).get(precision, preset["repo"])
|
||||||
|
|
||||||
if eff_backend == "gguf":
|
if eff_repo.lower().endswith(".gguf"):
|
||||||
msg = (f"[QwenVLImageJudge] '{eff_repo}' is GGUF — this node is transformers "
|
msg = (f"[QwenVLImageJudge] '{eff_repo}' is GGUF — this node is transformers "
|
||||||
f"(safetensors) only. Run GGUF models in a dedicated GGUF node "
|
f"(safetensors) only. Run GGUF models in a dedicated GGUF node "
|
||||||
f"(1038lab/ComfyUI-QwenVL or KLL535 Simple-Qwen3-VL-gguf).")
|
f"(1038lab/ComfyUI-QwenVL or KLL535 Simple-Qwen3-VL-gguf).")
|
||||||
|
|||||||
+3
-1
@@ -1,4 +1,6 @@
|
|||||||
# Qwen3-VL needs transformers >= 4.57 (the version the local checkpoint was saved with).
|
# Qwen3-VL needs transformers >= 4.57. The newer natively-multimodal Qwen3.5/3.6
|
||||||
|
# abliterated models need a recent transformers exposing AutoModelForMultimodalLM
|
||||||
|
# (upgrade transformers if a Qwen3.5/3.6 model fails to load).
|
||||||
transformers>=4.57.0
|
transformers>=4.57.0
|
||||||
huggingface_hub # auto-download of models by repo id / alias
|
huggingface_hub # auto-download of models by repo id / alias
|
||||||
torch
|
torch
|
||||||
|
|||||||
Reference in New Issue
Block a user