Handle reasoning models (Qwen3.5/3.6): no-think + JSON-only + prose fallback

Qwen3.5/3.6 are reasoning models — they 'think out loud' in markdown and never reach the JSON, then get cut off at the token limit -> '(no parseable judgement)'. Fixes: apply_chat_template(enable_thinking=False) + strip <think> blocks; hardened 'output ONLY JSON, do not think out loud' instruction; default max_new_tokens 1024->2048 (max 8192); and a markdown fallback parser (_parse_markdown_verdicts / _parse_axes) that extracts per-axis {verdict,ref,gen} from the prose the model reliably emits. describe falls back to using the raw text as the caption. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:25:16 +02:00
parent f5be04a5cb
commit 0e9e99b8b2
2 changed files with 64 additions and 13 deletions
@@ -38,7 +38,7 @@ can act on it.
 | `precision` | bf16 / fp8 / nf4 | bf16 | **the quant** — applies to the selected model (VRAM table below) |
 | `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`8b`/`30b-a3b`/`3.5-9b`/`3.6-27b`/`3.6-35b`). Empty = use `model_select` |
 | `axes` | STRING | "" (empty) | **override** the profile's axis set with a custom comma/newline list; empty = use `profile` |
-| `max_new_tokens` | INT | 1024 | |
+| `max_new_tokens` | INT | 2048 | raise it if a reasoning model (Qwen3.5/3.6) gets cut off before finishing |
 | `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
 | `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias |
 | `keep_loaded` | BOOL | true | cache weights across loop iterations |