fix(perf): default TF32 off; off = true fp32 (matmul + cuDNN conv)
Reported as "darker", but a fixed-seed spectral A/B shows TF32 is tonally neutral (centroid 564→565 Hz, HF>8k 0.00825→0.00833) — the perceived change is the seed=0 random-noise confound, not TF32. Still, TF32 is only ~1.15x and not bit-exact, so default it OFF for reference-fp32 output and let compile (~2.1x, op fusion) be the headline speedup. apply_tf32 now also toggles cuDNN conv-TF32 (PyTorch leaves it on by default), so off is genuinely fp32. Docs updated with the seed-confound A/B guidance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -56,9 +56,10 @@ class UniverSRModelLoader:
|
||||
},
|
||||
"optional": {
|
||||
"tf32": ("BOOLEAN", {
|
||||
"default": True,
|
||||
"tooltip": "Enable TF32 matmul on Ampere+ GPUs (~1.15x). Perceptually lossless "
|
||||
"but not bit-exact; global setting. Turn off for reference fp32.",
|
||||
"default": False,
|
||||
"tooltip": "TF32 matmul + conv on Ampere+ GPUs (~1.15x). Tonally neutral in testing "
|
||||
"but not bit-exact; off by default = reference fp32. A/B with a FIXED seed "
|
||||
"(seed!=0) — comparing two seed=0 runs changes the noise, not just TF32.",
|
||||
}),
|
||||
"compile": ("BOOLEAN", {
|
||||
"default": False,
|
||||
@@ -83,7 +84,7 @@ class UniverSRModelLoader:
|
||||
RETURN_NAMES = ("model",)
|
||||
FUNCTION = "load"
|
||||
|
||||
def load(self, model, device, tf32=True, compile=False, local_path="", config_path=""):
|
||||
def load(self, model, device, tf32=False, compile=False, local_path="", config_path=""):
|
||||
dev = _default_device() if device == "auto" else device
|
||||
if dev == "cuda" and not torch.cuda.is_available():
|
||||
print("[UniverSR] CUDA unavailable, falling back to CPU")
|
||||
|
||||
Reference in New Issue
Block a user