feat: rewrite audio scan with MFCC+delta+spectral contrast pipeline

Root cause of poor discrimination: MFCC[0] (energy) dominated the
feature vector, making cosine similarity see all audio as similar.

Changes:
- Skip MFCC[0], use 12 coefficients instead of 20
- Add delta MFCCs for temporal dynamics
- Add 7-band spectral contrast for tonal vs noise quality
- Switch from cosine similarity to euclidean-distance-based score
- Pre-compute STFT once for whole file (10-20x faster)
- Vectorized sliding window via cumulative sums (no Python loop)
- Lower sample rate 22050→16000 Hz (faster, no quality loss)
- 62-dim feature vector (was 40-dim mean+std of raw MFCCs)
- Default threshold 0.05 (new similarity scale)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-17 15:28:44 +02:00
parent 8ab5bdba77
commit f2c38aee79
3 changed files with 159 additions and 71 deletions
+1 -1
View File
@@ -1568,7 +1568,7 @@ class MainWindow(QMainWindow):
self._sld_threshold.setDecimals(2)
self._sld_threshold.setRange(0.0, 1.0)
self._sld_threshold.setSingleStep(0.01)
self._sld_threshold.setValue(0.70)
self._sld_threshold.setValue(0.05)
self._sld_threshold.setPrefix("Thr: ")
self._sld_threshold.setToolTip("Similarity threshold (0=match everything, 1=exact match)")