feat: rewrite audio scan with MFCC+delta+spectral contrast pipeline

Root cause of poor discrimination: MFCC[0] (energy) dominated the feature vector, making cosine similarity see all audio as similar. Changes: - Skip MFCC[0], use 12 coefficients instead of 20 - Add delta MFCCs for temporal dynamics - Add 7-band spectral contrast for tonal vs noise quality - Switch from cosine similarity to euclidean-distance-based score - Pre-compute STFT once for whole file (10-20x faster) - Vectorized sliding window via cumulative sums (no Python loop) - Lower sample rate 22050→16000 Hz (faster, no quality loss) - 62-dim feature vector (was 40-dim mean+std of raw MFCCs) - Default threshold 0.05 (new similarity scale) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 15:28:44 +02:00
parent 8ab5bdba77
commit f2c38aee79
3 changed files with 159 additions and 71 deletions
@@ -1568,7 +1568,7 @@ class MainWindow(QMainWindow):
        self._sld_threshold.setDecimals(2)
        self._sld_threshold.setRange(0.0, 1.0)
        self._sld_threshold.setSingleStep(0.01)
-        self._sld_threshold.setValue(0.70)
+        self._sld_threshold.setValue(0.05)
        self._sld_threshold.setPrefix("Thr: ")
        self._sld_threshold.setToolTip("Similarity threshold (0=match everything, 1=exact match)")