Re-enable reasoning for accurate verdicts (no-think rubber-stamped 'match')
Disabling thinking made reasoning models mark everything 'match' even when ref/gen clearly differ. Added an enable_thinking toggle (default ON) threaded through the generation path; the prompt now allows reasoning then asks for the result, and verdict_rule explicitly warns against lazy 'match'. _parse_json now scans for the JSON object AFTER the reasoning prose (last balanced object with 'axes'), and the markdown fallback already reads reasoned per-axis output. Default max_new_tokens 2048->3072 so verdicts don't get cut off. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -38,7 +38,8 @@ can act on it.
|
||||
| `precision` | bf16 / fp8 / nf4 | bf16 | **the quant** — applies to the selected model (VRAM table below) |
|
||||
| `model_path` | STRING | "" (empty) | **manual override** of the dropdown — local dir, HF repo id, or alias (`8b`/`30b-a3b`/`3.5-9b`/`3.6-27b`/`3.6-35b`). Empty = use `model_select` |
|
||||
| `axes` | STRING **input** | — | (socket) optional override of the profile's axis set; wire a text node or leave unconnected to use `profile` |
|
||||
| `max_new_tokens` | INT | 2048 | raise it if a reasoning model (Qwen3.5/3.6) gets cut off before finishing |
|
||||
| `max_new_tokens` | INT | 3072 | reasoning models (Qwen3.5/3.6) need room; raise it if the verdict gets cut off |
|
||||
| `enable_thinking` | BOOL | true | let the model reason before judging. **Keep on for accurate verdicts** — off makes reasoning models rubber-stamp `match`. Off is faster |
|
||||
| `temperature` | FLOAT | 0.0 | 0 = greedy/repeatable |
|
||||
| `swap_eval` | BOOL | true | run twice with images swapped, average → cuts position bias |
|
||||
| `keep_loaded` | BOOL | true | cache weights across loop iterations |
|
||||
|
||||
Reference in New Issue
Block a user