Add Qwen3.5/3.6 abliterated (safetensors) + split model/quant selectors
No GGUF needed: huihui ships Qwen3.5-9B, Qwen3.6-27B, Qwen3.6-35B-A3B as multimodal SAFETENSORS (abliterated), loadable via transformers AutoModelForMultimodalLM. Added them to the model dropdown. _resolve_vl_classes now tries AutoModelForMultimodalLM (3.5/3.6) and AutoModelForImageTextToText (Qwen3-VL) in name-based order, with load fallback across candidates. model_select is now the model NAME only; precision is the separate quant dropdown applied to it (repo_by_precision routes e.g. the local fp8 dir). Aliases 3.5-9b/3.6-27b/3.6-35b. VRAM-by-quant table in README. Needs a recent transformers for 3.5/3.6. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -34,9 +34,9 @@ can act on it.
|
||||
| `mode` | compare / describe | compare | `describe` = first pass over the reference only → caption + target spec (seeds the prompt). `compare` = score ref vs generated |
|
||||
| `profile` | general / oral / penetration / handjob / solo | general | **analysis profile** — act-specialized axis set; the act-critical axes are distance/proximity-aware (e.g. `mouth_genital_distance`) so magnitude isn't hidden behind a coarse label |
|
||||
| `generated_image` | IMAGE (optional) | — | the candidate to score (required for `compare`, ignored for `describe`) |
|
||||
| `model_select` | dropdown | 4B local bf16 | curated **transformers** judges with VRAM hints: 4B local (bf16 ~9GB / fp8 ~5GB), 8B (bf16 ~17GB), 30B-A3B (nf4 ~18GB, slow). Auto-downloads on first use |
|
||||
| `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`30b-a3b`/`8b`/`4b`). Empty = use `model_select` |
|
||||
| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | applies to a manual `model_path`; presets carry their own precision |
|
||||
| `model_select` | dropdown (model name) | 4B local | **which judge** (transformers/safetensors, auto-downloaded): Qwen3-VL 4B/8B/30B-A3B, **Qwen3.5-9B**, **Qwen3.6-27B/35B-A3B** (newer, natively multimodal). Param size shown in the label |
|
||||
| `precision` | bf16 / fp16 / fp8 / nf4 | bf16 | **the quant** — applies to the selected model (VRAM table below) |
|
||||
| `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`8b`/`30b-a3b`/`3.5-9b`/`3.6-27b`/`3.6-35b`). Empty = use `model_select` |
|
||||
| `axes` | STRING | "" (empty) | **override** the profile's axis set with a custom comma/newline list; empty = use `profile` |
|
||||
| `max_new_tokens` | INT | 1024 | |
|
||||
| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
|
||||
@@ -72,18 +72,25 @@ converted at `/media/p5/qwen3vl_4b_abliterated_comfy_convert/` so it runs out of
|
||||
(the abliterated/uncensored variant won't refuse to analyze adult imagery, which would
|
||||
otherwise break the loop).
|
||||
|
||||
**Pick the judge from `model_select`** (transformers / safetensors, auto-downloaded):
|
||||
the **8B** abliterated at `bf16` (~17 GB) is the recommended intermediate — fast and
|
||||
clearly better than the 4B. The **30B-A3B** (nf4 ~18 GB) is higher quality but slow (the
|
||||
`nf4`/bitsandbytes path is the bottleneck, not the model). `model_path` overrides the
|
||||
dropdown for anything else.
|
||||
**Pick a model in `model_select` and a quant in `precision`.** All are abliterated,
|
||||
multimodal **safetensors** (transformers), auto-downloaded. The newer **Qwen3.5/3.6** are
|
||||
natively multimodal (need a recent transformers — they load via `AutoModelForMultimodalLM`).
|
||||
|
||||
**GGUF-only models are not in the dropdown.** Newer uncensored builds like
|
||||
[`HauhauCS/Qwen3.5-9B-Uncensored`](https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive)
|
||||
and [`HauhauCS/Qwen3.6-35B-A3B-Uncensored`](https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive)
|
||||
ship as **GGUF + mmproj** only — this node is transformers-only, so run those in a dedicated
|
||||
GGUF node ([1038lab/ComfyUI-QwenVL](https://github.com/1038lab/ComfyUI-QwenVL) or
|
||||
KLL535 Simple-Qwen3-VL-gguf) and feed their text output into the loop. See
|
||||
VRAM by quant on the RTX 5090 32 GB (✅ fits / ⚠ tight / ❌):
|
||||
|
||||
| model | bf16 | fp8 | nf4 | note |
|
||||
|---|---|---|---|---|
|
||||
| Qwen3-VL-4B (local) | ✅ ~9 | ✅ ~5 | ✅ ~3 | fast, weak |
|
||||
| Qwen3-VL-8B | ✅ ~17 | ✅ ~9 | ✅ ~6 | solid, fast |
|
||||
| **Qwen3.5-9B** | ✅ ~20 | ✅ ~10 | ✅ ~7 | **newer, fast — recommended** |
|
||||
| Qwen3-VL-30B-A3B (MoE) | ❌ ~62 | ⚠ ~31 | ✅ ~18 | nf4 slow |
|
||||
| Qwen3.6-27B (dense) | ❌ ~56 | ⚠ ~28 | ✅ ~16 | nf4 slow, strong |
|
||||
| Qwen3.6-35B-A3B (MoE) | ❌ ~70 | ❌ | ✅ ~20 | nf4 slow, top quality |
|
||||
|
||||
`nf4` (bitsandbytes) fits the big ones but is **slow** (dequant overhead) — that's the
|
||||
bottleneck, not the model. `fp8` is fast but only when a real fp8 checkpoint exists (the
|
||||
local 4B has one; `precision=fp8` on a bf16-only repo won't quantize). For speed + recency,
|
||||
**Qwen3.5-9B at bf16** is the sweet spot. See
|
||||
[docs/METHODOLOGY.md](docs/METHODOLOGY.md#model-sizing-on-32-gb-rtx-5090--abliterated-latest-qwen-vl).
|
||||
|
||||
## Loop sketch
|
||||
|
||||
Reference in New Issue
Block a user