Re-enable reasoning for accurate verdicts (no-think rubber-stamped 'match')
Disabling thinking made reasoning models mark everything 'match' even when ref/gen clearly differ. Added an enable_thinking toggle (default ON) threaded through the generation path; the prompt now allows reasoning then asks for the result, and verdict_rule explicitly warns against lazy 'match'. _parse_json now scans for the JSON object AFTER the reasoning prose (last balanced object with 'axes'), and the markdown fallback already reads reasoned per-axis output. Default max_new_tokens 2048->3072 so verdicts don't get cut off. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -68,7 +68,7 @@
|
||||
"model_path": "/media/p5/qwen3vl_4b_abliterated_comfy_convert/hf_bf16",
|
||||
"precision": "bf16",
|
||||
"profile": "general",
|
||||
"max_new_tokens": 2048,
|
||||
"max_new_tokens": 3072,
|
||||
"temperature": 0.0,
|
||||
"swap_eval": true,
|
||||
"keep_loaded": true,
|
||||
|
||||
Reference in New Issue
Block a user