feat: rewrite audio scan with MFCC+delta+spectral contrast pipeline
Root cause of poor discrimination: MFCC[0] (energy) dominated the feature vector, making cosine similarity see all audio as similar. Changes: - Skip MFCC[0], use 12 coefficients instead of 20 - Add delta MFCCs for temporal dynamics - Add 7-band spectral contrast for tonal vs noise quality - Switch from cosine similarity to euclidean-distance-based score - Pre-compute STFT once for whole file (10-20x faster) - Vectorized sliding window via cumulative sums (no Python loop) - Lower sample rate 22050→16000 Hz (faster, no quality loss) - 62-dim feature vector (was 40-dim mean+std of raw MFCCs) - Default threshold 0.05 (new similarity scale) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1568,7 +1568,7 @@ class MainWindow(QMainWindow):
|
||||
self._sld_threshold.setDecimals(2)
|
||||
self._sld_threshold.setRange(0.0, 1.0)
|
||||
self._sld_threshold.setSingleStep(0.01)
|
||||
self._sld_threshold.setValue(0.70)
|
||||
self._sld_threshold.setValue(0.05)
|
||||
self._sld_threshold.setPrefix("Thr: ")
|
||||
self._sld_threshold.setToolTip("Similarity threshold (0=match everything, 1=exact match)")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user