Re-enable reasoning for accurate verdicts (no-think rubber-stamped 'match')

Disabling thinking made reasoning models mark everything 'match' even when ref/gen
clearly differ. Added an enable_thinking toggle (default ON) threaded through the
generation path; the prompt now allows reasoning then asks for the result, and
verdict_rule explicitly warns against lazy 'match'. _parse_json now scans for the
JSON object AFTER the reasoning prose (last balanced object with 'axes'), and the
markdown fallback already reads reasoned per-axis output. Default max_new_tokens
2048->3072 so verdicts don't get cut off.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-27 10:56:47 +02:00
parent fee136e98c
commit 22fd24b29e
4 changed files with 96 additions and 74 deletions
+1 -1
View File
@@ -68,7 +68,7 @@
"model_path": "/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16",
"precision": "bf16",
"profile": "general",
"max_new_tokens": 2048,
"max_new_tokens": 3072,
"temperature": 0.0,
"swap_eval": true,
"keep_loaded": true,
+1 -1
View File
@@ -12,7 +12,7 @@
"profile": "general",
"model_path": "/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16",
"precision": "bf16",
"max_new_tokens": 2048,
"max_new_tokens": 3072,
"temperature": 0.0,
"swap_eval": false,
"keep_loaded": true,