docs: add scan history & hard negative management design + plan

Covers scan result versioning per model, hard negative management dialog with training toggle, and ghost folder fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-19 14:51:17 +02:00
parent f5361a963e
commit e7b791fbfa
2 changed files with 804 additions and 0 deletions
@@ -0,0 +1,90 @@
+# Scan History & Hard Negative Management Design
+
+Date: 2026-04-19
+
+## Goal
+
+1. Keep scan result history per `(file, model)` so users can track classifier improvement across training iterations
+2. Make hard negatives manageable — viewable, removable, and optionally disabled per training run
+3. Fix latent bug: `get_export_folders()` doesn't filter by `scan_export`
+
+## 1. Scan Result History
+
+### Current behavior
+
+`save_scan_results()` **replaces** all results for `(filename, profile, model)` on every scan. No history is preserved.
+
+### Change
+
+Keep the last N scan results per `(filename, profile, model)` with timestamps. The most recent is the "active" result displayed in the panel; older versions are accessible for comparison.
+
+### Schema change
+
+Add column to `scan_results`:
+
+```sql
+ALTER TABLE scan_results ADD COLUMN scan_timestamp TEXT NOT NULL DEFAULT '';
+```
+
+All rows from the same scan share the same timestamp string (e.g. `"20260419_143022"`).
+
+### save_scan_results changes
+
+Instead of `DELETE ... WHERE filename=? AND profile=? AND model=?`, the new flow:
+
+1. Insert new rows with current timestamp
+2. Count distinct timestamps for this `(filename, profile, model)`
+3. If count > N (default 5), delete rows belonging to the oldest timestamps
+
+### UI changes
+
+Add a small version dropdown/selector in `ScanResultsPanel` per model tab — shows timestamps of available scan versions. Selecting a version loads that version's results into the tab. The most recent is selected by default.
+
+The tab label shows the active version's region count, e.g. `HUBERT_XLARGE (12) [v3]`.
+
+### Cache interaction
+
+Embedding cache is per `(file, model)` and doesn't change across scans. Only the classifier output changes. History stores the classified regions (start, end, score), not embeddings.
+
+## 2. Hard Negative Management
+
+### Current behavior
+
+- Hard negatives stored in `hard_negatives` table: `(filename, profile, start_time, source_path)`
+- No model column — applied globally within a profile
+- Removable one-by-one via N toggle in scan panel, but no bulk management
+- Always used in training — no way to disable
+
+### Changes
+
+#### Schema
+
+Add `source_model TEXT NOT NULL DEFAULT ''` column to `hard_negatives`. Populated when marking negatives from scan results (we know which model tab is active).
+
+#### Training toggle
+
+New checkbox in `TrainDialog`: **"Use hard negatives"** (default checked). When unchecked, `get_training_data()` skips the `hard_negatives` query entirely. Non-destructive — negatives remain in DB.
+
+#### Management dialog
+
+New `HardNegativesDialog` accessible from Train dialog via "Manage..." button next to the checkbox. Shows:
+
+- Table: filename, start time, source model, date added (if we add created_at)
+- Filter by source model (dropdown)
+- Multi-select + Delete button
+- "Clear All" button with confirmation
+- Count summary at top
+
+### Training integration
+
+`get_training_data()` gets a new `use_hard_negatives: bool = True` parameter. When False, the hard negatives query (lines 365-374 of db.py) is skipped entirely.
+
+## 3. Ghost Folder Fix
+
+### Bug
+
+`get_export_folders()` queries all `output_path` rows without filtering `scan_export`. Folders that only contain scan-exported clips appear in training dropdowns with 0 clips.
+
+### Fix
+
+Add `include_scan_exports` parameter to `get_export_folders()`. When False (default), only query rows with `scan_export = 0`. Also filter out folders with 0 clips from `get_training_stats()` result dict.
@@ -0,0 +1,714 @@
+# Scan History & Hard Negative Management Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Add scan result versioning, hard negative management dialog with training toggle, and fix ghost folder bug.
+
+**Architecture:** DB schema changes in `core/db.py` (new columns, new queries). UI changes in `main.py` (version selector in ScanResultsPanel, management dialog, training toggle). No changes to `core/audio_scan.py`.
+
+**Tech Stack:** SQLite (existing), PyQt6 (existing)
+
+**Key design notes:**
+- Scan history stores N versions per `(filename, profile, model)` using a `scan_timestamp` column. All rows from one scan share the same timestamp.
+- Hard negatives gain a `source_model` column (informational) and training gains a `use_hard_negatives` toggle.
+- `get_export_folders()` must respect `scan_export` filter to prevent ghost folders.
+
+---
+
+### Task 1: Fix ghost folder bug in get_export_folders
+
+**Files:**
+- Modify: `core/db.py:294-313` (get_export_folders)
+- Modify: `core/db.py:410-443` (get_training_stats — filter out 0-clip folders)
+- Test: `tests/test_db.py`
+
+**Step 1: Write failing test**
+
+```python
+def test_export_folders_excludes_scan_exports():
+    """Scan-export-only folders should not appear when include_scan_exports=False."""
+    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
+        path = f.name
+    try:
+        db = ProcessedDB(path)
+        # Manual export
+        db.add("a.mp4", 10.0, "/out/mp4_Intense/g1/clip.mp4", profile="test")
+        # Scan export to different folder
+        db.add("a.mp4", 20.0, "/out/mp4_ScanOnly/g1/clip.mp4", profile="test",
+               scan_export=True)
+        folders = db.get_export_folders("test")
+        assert "mp4_Intense" in folders
+        assert "mp4_ScanOnly" not in folders, "scan-only folder should be excluded"
+        # With include_scan_exports=True, both should appear
+        folders_all = db.get_export_folders("test", include_scan_exports=True)
+        assert "mp4_ScanOnly" in folders_all
+    finally:
+        os.unlink(path)
+```
+
+**Step 2: Fix get_export_folders**
+
+Add `include_scan_exports` parameter:
+
+```python
+def get_export_folders(self, profile: str = "default",
+                       include_scan_exports: bool = False) -> list[str]:
+    if not self._enabled:
+        return []
+    if include_scan_exports:
+        rows = self._con.execute(
+            "SELECT DISTINCT output_path FROM processed WHERE profile = ?",
+            (profile,),
+        ).fetchall()
+    else:
+        rows = self._con.execute(
+            "SELECT DISTINCT output_path FROM processed"
+            " WHERE profile = ? AND scan_export = 0",
+            (profile,),
+        ).fetchall()
+    folder_names: set[str] = set()
+    for (op,) in rows:
+        grandparent = os.path.basename(os.path.dirname(os.path.dirname(op)))
+        if grandparent:
+            folder_names.add(grandparent)
+    return sorted(folder_names)
+```
+
+**Step 3: Update get_training_stats to pass through**
+
+```python
+    folders = self.get_export_folders(profile, include_scan_exports=include_scan_exports)
+```
+
+And filter out empty folders at the end:
+
+```python
+    return {k: v for k, v in stats.items() if v["clips"] > 0}
+```
+
+**Step 4: Run tests, commit**
+
+```bash
+pytest tests/ -v
+git add core/db.py tests/test_db.py
+git commit -m "fix: get_export_folders respects scan_export filter"
+```
+
+---
+
+### Task 2: Scan result history — schema and DB methods
+
+**Files:**
+- Modify: `core/db.py:86-98` (scan_results schema — add scan_timestamp column)
+- Modify: `core/db.py:100-113` (migration — add scan_timestamp to existing tables)
+- Modify: `core/db.py:447-468` (save_scan_results — version management)
+- Add: `core/db.py` (get_scan_versions, load_scan_version, delete_scan_version)
+- Test: `tests/test_db.py`
+
+**Step 1: Write failing test**
+
+```python
+def test_scan_result_history():
+    """save_scan_results should keep multiple versions."""
+    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
+        path = f.name
+    try:
+        db = ProcessedDB(path)
+        # Save three versions
+        db.save_scan_results("v.mp4", "test", "MODEL_A",
+                             [(0, 8, 0.9)])
+        db.save_scan_results("v.mp4", "test", "MODEL_A",
+                             [(0, 8, 0.8), (10, 18, 0.7)])
+        db.save_scan_results("v.mp4", "test", "MODEL_A",
+                             [(5, 13, 0.95)])
+        versions = db.get_scan_versions("v.mp4", "test", "MODEL_A")
+        assert len(versions) == 3
+        # Most recent first
+        assert versions[0]["count"] == 1   # latest: 1 region
+        assert versions[1]["count"] == 2   # middle: 2 regions
+        assert versions[2]["count"] == 1   # oldest: 1 region
+        # get_scan_results returns latest version by default
+        results = db.get_scan_results("v.mp4", "test")
+        assert len(results.get("MODEL_A", [])) == 1
+    finally:
+        os.unlink(path)
+```
+
+**Step 2: Add scan_timestamp column**
+
+In the CREATE TABLE (line 87-98), add:
+
+```sql
+  scan_timestamp  TEXT NOT NULL DEFAULT ''
+```
+
+In the migration block (lines 100-113), add:
+
+```python
+        ("scan_timestamp", "TEXT NOT NULL DEFAULT ''"),
+```
+
+**Step 3: Modify save_scan_results**
+
+Replace the current DELETE+INSERT with versioned insert + cleanup:
+
+```python
+def save_scan_results(self, filename: str, profile: str, model: str,
+                      regions: list[tuple[float, float, float]],
+                      max_versions: int = 5) -> None:
+    if not self._enabled:
+        return
+    from datetime import datetime
+    ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+    with self._lock:
+        self._con.executemany(
+            "INSERT INTO scan_results"
+            " (filename, profile, model, start_time, end_time, score,"
+            "  orig_start_time, orig_end_time, scan_timestamp)"
+            " VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
+            [(filename, profile, model, s, e, sc, s, e, ts)
+             for s, e, sc in regions],
+        )
+        # Prune old versions beyond max_versions
+        versions = self._con.execute(
+            "SELECT DISTINCT scan_timestamp FROM scan_results"
+            " WHERE filename = ? AND profile = ? AND model = ?"
+            " ORDER BY scan_timestamp DESC",
+            (filename, profile, model),
+        ).fetchall()
+        if len(versions) > max_versions:
+            old_ts = [v[0] for v in versions[max_versions:]]
+            self._con.execute(
+                "DELETE FROM scan_results"
+                " WHERE filename = ? AND profile = ? AND model = ?"
+                f" AND scan_timestamp IN ({','.join('?' * len(old_ts))})",
+                (filename, profile, model, *old_ts),
+            )
+        self._con.commit()
+```
+
+**Step 4: Add get_scan_versions**
+
+```python
+def get_scan_versions(self, filename: str, profile: str, model: str
+                      ) -> list[dict]:
+    """Return list of scan versions for (filename, profile, model).
+    
+    Returns [{timestamp, count, max_score}, ...] ordered newest first.
+    """
+    if not self._enabled:
+        return []
+    rows = self._con.execute(
+        "SELECT scan_timestamp, COUNT(*), MAX(score)"
+        " FROM scan_results"
+        " WHERE filename = ? AND profile = ? AND model = ?"
+        "   AND scan_timestamp != ''"
+        " GROUP BY scan_timestamp"
+        " ORDER BY scan_timestamp DESC",
+        (filename, profile, model),
+    ).fetchall()
+    return [{"timestamp": ts, "count": cnt, "max_score": sc}
+            for ts, cnt, sc in rows]
+```
+
+**Step 5: Modify get_scan_results to support version selection**
+
+Add optional `scan_timestamp` parameter. When None (default), returns latest version:
+
+```python
+def get_scan_results(self, filename: str, profile: str,
+                     scan_timestamp: str | None = None
+                     ) -> dict[str, list[tuple]]:
+    if not self._enabled:
+        return {}
+    if scan_timestamp:
+        rows = self._con.execute(
+            "SELECT id, model, start_time, end_time, score, disabled,"
+            "       orig_start_time, orig_end_time"
+            " FROM scan_results"
+            " WHERE filename = ? AND profile = ? AND scan_timestamp = ?"
+            " ORDER BY model, start_time",
+            (filename, profile, scan_timestamp),
+        ).fetchall()
+    else:
+        # For each model, get rows from the latest timestamp only
+        rows = self._con.execute(
+            "SELECT r.id, r.model, r.start_time, r.end_time, r.score,"
+            "       r.disabled, r.orig_start_time, r.orig_end_time"
+            " FROM scan_results r"
+            " INNER JOIN ("
+            "   SELECT model, MAX(scan_timestamp) AS latest"
+            "   FROM scan_results"
+            "   WHERE filename = ? AND profile = ?"
+            "   GROUP BY model"
+            " ) m ON r.model = m.model AND r.scan_timestamp = m.latest"
+            " WHERE r.filename = ? AND r.profile = ?"
+            " ORDER BY r.model, r.start_time",
+            (filename, profile, filename, profile),
+        ).fetchall()
+    result: dict[str, list] = {}
+    for row_id, model, s, e, sc, dis, os_, oe in rows:
+        result.setdefault(model, []).append(
+            (row_id, s, e, sc, bool(dis),
+             os_ if os_ is not None else s,
+             oe if oe is not None else e))
+    return result
+```
+
+**Important:** Legacy rows (before this change) have `scan_timestamp = ''`. The `MAX(scan_timestamp)` query handles this correctly — empty string sorts before any real timestamp, so legacy rows are returned when they're the only version. The `get_scan_versions` query filters `scan_timestamp != ''` so legacy rows don't appear as named versions.
+
+**Step 6: Run tests, commit**
+
+```bash
+pytest tests/ -v
+git add core/db.py tests/test_db.py
+git commit -m "feat: scan result history — keep N versions per (file, model)"
+```
+
+---
+
+### Task 3: Scan history UI — version selector in ScanResultsPanel
+
+**Files:**
+- Modify: `main.py` (ScanResultsPanel — add version combo per tab)
+- Modify: `main.py` (ScanResultsPanel.load_for_file — populate versions)
+
+**Step 1: Add version combo to tab UI**
+
+In `ScanResultsPanel._add_tab()`, add a small QComboBox above the table. When no history exists, hide it. When versions exist, populate with timestamps and connect to a slot that reloads the tab with that version.
+
+```python
+# In _add_tab, create a container widget with version combo + table
+container = QWidget()
+layout = QVBoxLayout(container)
+layout.setContentsMargins(0, 0, 0, 0)
+
+cmb_version = QComboBox()
+cmb_version.setMaximumWidth(200)
+cmb_version.setToolTip("Scan version history")
+cmb_version.hide()  # Hidden when only 1 version
+layout.addWidget(cmb_version)
+layout.addWidget(table)
+
+self._tabs.addTab(container, label)
+```
+
+Store the combo and table as properties on the container widget for later access.
+
+**Step 2: Populate versions in load_for_file**
+
+After creating each model tab, query `get_scan_versions()`. If > 1 version, show the combo with entries like `"2026-04-19 14:30 (12 regions, best: 0.95)"`. Connect `currentIndexChanged` to reload that version's results.
+
+**Step 3: Version switching slot**
+
+When user selects a different version from the combo:
+1. Call `db.get_scan_results(filename, profile, scan_timestamp=selected_ts)`
+2. Repopulate the table with that version's rows
+3. Update timeline regions
+
+**Step 4: Test manually, commit**
+
+```bash
+git add main.py
+git commit -m "feat: scan version selector in results panel"
+```
+
+---
+
+### Task 4: Hard negatives — schema and training toggle
+
+**Files:**
+- Modify: `core/db.py:118-130` (hard_negatives schema — add source_model column)
+- Modify: `core/db.py:548-560` (add_hard_negatives — accept source_model)
+- Modify: `core/db.py:365-374` (get_training_data — use_hard_negatives parameter)
+- Modify: `main.py` (TrainDialog — add "Use hard negatives" checkbox)
+- Modify: `main.py` (_open_train_dialog — pass use_hard_negatives to get_training_data)
+- Test: `tests/test_db.py`
+
+**Step 1: Write failing test**
+
+```python
+def test_hard_negatives_source_model():
+    """Hard negatives should store source_model."""
+    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
+        path = f.name
+    try:
+        db = ProcessedDB(path)
+        db.add_hard_negatives("a.mp4", "test", [10.0, 20.0],
+                              source_path="/a.mp4", source_model="HUBERT_XLARGE")
+        rows = db.get_hard_negatives("test")
+        assert len(rows) == 2
+        assert all(r["source_model"] == "HUBERT_XLARGE" for r in rows)
+    finally:
+        os.unlink(path)
+
+def test_training_data_skips_hard_negatives():
+    """get_training_data with use_hard_negatives=False should skip them."""
+    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
+        path = f.name
+    try:
+        db = ProcessedDB(path)
+        db.add("a.mp4", 10.0, "/out/folder/g/clip.mp4", profile="test",
+               source_path="/videos/a.mp4")
+        db.add_hard_negatives("a.mp4", "test", [500.0], source_path="/videos/a.mp4")
+        # With hard negatives
+        data_with = db.get_training_data("test", "folder", use_hard_negatives=True)
+        # Without hard negatives
+        data_without = db.get_training_data("test", "folder", use_hard_negatives=False)
+        # Both should find the video, but negative counts differ
+        assert len(data_with) >= 1
+        neg_with = sum(len(vi[3]) for vi in data_with)
+        neg_without = sum(len(vi[3]) for vi in data_without)
+        assert neg_with > neg_without or neg_with == neg_without  # depends on margin
+    finally:
+        os.unlink(path)
+```
+
+**Step 2: Add source_model column to hard_negatives**
+
+In CREATE TABLE (line 119-125), add:
+
+```sql
+  source_model TEXT NOT NULL DEFAULT ''
+```
+
+In migration section, add after the hard_negatives table creation:
+
+```python
+hn_cols = {
+    row[1]
+    for row in self._con.execute("PRAGMA table_info(hard_negatives)").fetchall()
+}
+if "source_model" not in hn_cols:
+    self._con.execute(
+        "ALTER TABLE hard_negatives ADD COLUMN source_model TEXT NOT NULL DEFAULT ''"
+    )
+```
+
+**Step 3: Update add_hard_negatives to accept source_model**
+
+```python
+def add_hard_negatives(self, filename: str, profile: str,
+                       times: list[float], source_path: str = "",
+                       source_model: str = "") -> None:
+    if not self._enabled or not times:
+        return
+    with self._lock:
+        for t in times:
+            self._con.execute(
+                "INSERT INTO hard_negatives"
+                " (filename, profile, start_time, source_path, source_model)"
+                " VALUES (?, ?, ?, ?, ?)",
+                (filename, profile, t, source_path, source_model),
+            )
+        self._con.commit()
+```
+
+**Step 4: Add get_hard_negatives (full rows for management dialog)**
+
+```python
+def get_hard_negatives(self, profile: str) -> list[dict]:
+    """Return all hard negatives for a profile with full details."""
+    if not self._enabled:
+        return []
+    rows = self._con.execute(
+        "SELECT id, filename, start_time, source_path, source_model"
+        " FROM hard_negatives WHERE profile = ?"
+        " ORDER BY filename, start_time",
+        (profile,),
+    ).fetchall()
+    return [{"id": r[0], "filename": r[1], "start_time": r[2],
+             "source_path": r[3], "source_model": r[4]} for r in rows]
+```
+
+**Step 5: Add delete_hard_negatives_by_ids**
+
+```python
+def delete_hard_negatives_by_ids(self, ids: list[int]) -> None:
+    """Delete hard negatives by row IDs."""
+    if not self._enabled or not ids:
+        return
+    with self._lock:
+        self._con.execute(
+            f"DELETE FROM hard_negatives WHERE id IN ({','.join('?' * len(ids))})",
+            ids,
+        )
+        self._con.commit()
+```
+
+**Step 6: Add use_hard_negatives parameter to get_training_data**
+
+In `get_training_data()` (line 315), add parameter:
+
+```python
+def get_training_data(self, profile: str, positive_folder: str,
+                      negative_folder: str = "",
+                      fallback_video_dir: str = "",
+                      include_scan_exports: bool = False,
+                      use_hard_negatives: bool = True,
+                      ) -> list[tuple[str, list[float], list[float], list[float]]]:
+```
+
+Then wrap the hard negatives query (lines 365-374) in a conditional:
+
+```python
+    if use_hard_negatives:
+        hard_rows = self._con.execute(
+            "SELECT filename, start_time, source_path FROM hard_negatives"
+            " WHERE profile = ?",
+            (profile,),
+        ).fetchall()
+        for fn, st, sp in hard_rows:
+            neg_by_video.setdefault(fn, set()).add(st)
+            if sp:
+                source_by_filename.setdefault(fn, sp)
+```
+
+**Step 7: Pass source_model when marking negatives from scan panel**
+
+In `main.py`, `_on_scan_negatives()` needs to pass the current scan model. The scan panel knows which tab is active:
+
+```python
+def _on_scan_negatives(self, times: list) -> None:
+    if not self._file_path:
+        return
+    filename = os.path.basename(self._file_path)
+    # Get current model tab name for source_model
+    source_model = self._scan_panel.current_model_name()
+    self._db.add_hard_negatives(filename, self._profile, times,
+                                source_path=self._file_path,
+                                source_model=source_model)
+```
+
+Add `current_model_name()` to ScanResultsPanel:
+
+```python
+def current_model_name(self) -> str:
+    """Return the model name of the currently active tab."""
+    idx = self._tabs.currentIndex()
+    if idx >= 0:
+        return self._tabs.tabText(idx).split(" (")[0]  # strip count suffix
+    return ""
+```
+
+**Step 8: Add training toggle to TrainDialog**
+
+After the existing `_chk_scan_exports` checkbox:
+
+```python
+self._chk_hard_negatives = QCheckBox("Use hard negatives in training")
+self._chk_hard_negatives.setChecked(True)
+self._chk_hard_negatives.setToolTip(
+    "When unchecked, manually marked hard negatives are excluded from training.\n"
+    "Useful when training a new model type where old negatives may not apply.")
+self._chk_hard_negatives.stateChanged.connect(lambda: self._debounce.start())
+form.addRow("", self._chk_hard_negatives)
+```
+
+Add property:
+
+```python
+@property
+def use_hard_negatives(self) -> bool:
+    return self._chk_hard_negatives.isChecked()
+```
+
+**Step 9: Wire toggle through _open_train_dialog**
+
+In `_open_train_dialog()`, pass the flag:
+
+```python
+    video_infos = self._db.get_training_data(
+        self._profile, pos_folder, negative_folder=neg_folder,
+        fallback_video_dir=video_dir,
+        include_scan_exports=inc_scan,
+        use_hard_negatives=dlg.use_hard_negatives,
+    )
+```
+
+Also update `_update_stats()` in TrainDialog to pass it through for accurate counts:
+
+```python
+    use_neg = self._chk_hard_negatives.isChecked() if hasattr(self, '_chk_hard_negatives') else True
+    video_infos = self._db.get_training_data(
+        self._profile, folder, negative_folder=neg_folder,
+        fallback_video_dir=self._txt_video_dir.text(),
+        include_scan_exports=inc_scan,
+        use_hard_negatives=use_neg,
+    )
+```
+
+**Step 10: Run tests, commit**
+
+```bash
+pytest tests/ -v
+git add core/db.py main.py tests/test_db.py
+git commit -m "feat: hard negative source_model tracking, training toggle"
+```
+
+---
+
+### Task 5: Hard negatives management dialog
+
+**Files:**
+- Modify: `main.py` (add HardNegativesDialog class)
+- Modify: `main.py` (TrainDialog — add "Manage..." button)
+
+**Step 1: Create HardNegativesDialog**
+
+Place before TrainDialog class:
+
+```python
+class HardNegativesDialog(QDialog):
+    """View and manage hard negative training examples."""
+
+    def __init__(self, db: ProcessedDB, profile: str, parent=None):
+        super().__init__(parent)
+        self.setWindowTitle("Hard Negatives")
+        self.setMinimumSize(600, 400)
+        self._db = db
+        self._profile = profile
+
+        layout = QVBoxLayout(self)
+
+        # Filter row
+        filter_row = QHBoxLayout()
+        filter_row.addWidget(QLabel("Filter model:"))
+        self._cmb_filter = QComboBox()
+        self._cmb_filter.addItem("(all)")
+        self._cmb_filter.currentIndexChanged.connect(self._apply_filter)
+        filter_row.addWidget(self._cmb_filter, 1)
+        layout.addLayout(filter_row)
+
+        # Summary
+        self._lbl_summary = QLabel()
+        layout.addWidget(self._lbl_summary)
+
+        # Table
+        self._table = QTableWidget(0, 4)
+        self._table.setHorizontalHeaderLabels(
+            ["File", "Time", "Source Model", "ID"])
+        self._table.horizontalHeader().setSectionResizeMode(
+            0, QHeaderView.ResizeMode.Stretch)
+        self._table.setEditTriggers(QTableWidget.EditTrigger.NoEditTriggers)
+        self._table.setSelectionBehavior(QTableWidget.SelectionBehavior.SelectRows)
+        self._table.setColumnHidden(3, True)  # hide ID column
+        layout.addWidget(self._table)
+
+        # Buttons
+        btn_row = QHBoxLayout()
+        btn_delete = QPushButton("Delete Selected")
+        btn_delete.clicked.connect(self._delete_selected)
+        btn_row.addWidget(btn_delete)
+        btn_clear = QPushButton("Clear All")
+        btn_clear.clicked.connect(self._clear_all)
+        btn_row.addWidget(btn_clear)
+        btn_row.addStretch()
+        btn_close = QPushButton("Close")
+        btn_close.clicked.connect(self.close)
+        btn_row.addWidget(btn_close)
+        layout.addLayout(btn_row)
+
+        self._load()
+
+    def _load(self):
+        rows = self._db.get_hard_negatives(self._profile)
+        models = sorted(set(r["source_model"] for r in rows if r["source_model"]))
+        self._cmb_filter.blockSignals(True)
+        self._cmb_filter.clear()
+        self._cmb_filter.addItem("(all)")
+        for m in models:
+            self._cmb_filter.addItem(m)
+        self._cmb_filter.blockSignals(False)
+
+        self._table.setRowCount(len(rows))
+        for i, r in enumerate(rows):
+            self._table.setItem(i, 0, QTableWidgetItem(r["filename"]))
+            self._table.setItem(i, 1, QTableWidgetItem(f'{r["start_time"]:.1f}s'))
+            self._table.setItem(i, 2, QTableWidgetItem(r["source_model"]))
+            item = QTableWidgetItem(str(r["id"]))
+            self._table.setItem(i, 3, item)
+        self._lbl_summary.setText(f"<b>{len(rows)}</b> hard negatives")
+
+    def _apply_filter(self):
+        model = self._cmb_filter.currentText()
+        for row in range(self._table.rowCount()):
+            if model == "(all)":
+                self._table.setRowHidden(row, False)
+            else:
+                src = self._table.item(row, 2).text()
+                self._table.setRowHidden(row, src != model)
+
+    def _delete_selected(self):
+        ids = []
+        for row in sorted(set(i.row() for i in self._table.selectedItems()), reverse=True):
+            if not self._table.isRowHidden(row):
+                ids.append(int(self._table.item(row, 3).text()))
+        if ids:
+            self._db.delete_hard_negatives_by_ids(ids)
+            self._load()
+
+    def _clear_all(self):
+        reply = QMessageBox.question(
+            self, "Clear All",
+            f"Delete all hard negatives for profile '{self._profile}'?",
+            QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No,
+        )
+        if reply == QMessageBox.StandardButton.Yes:
+            all_rows = self._db.get_hard_negatives(self._profile)
+            self._db.delete_hard_negatives_by_ids([r["id"] for r in all_rows])
+            self._load()
+```
+
+**Step 2: Add "Manage..." button to TrainDialog**
+
+After the hard negatives checkbox, add a button:
+
+```python
+neg_row = QHBoxLayout()
+neg_row.addWidget(self._chk_hard_negatives)
+btn_manage_neg = QPushButton("Manage…")
+btn_manage_neg.setFixedWidth(80)
+btn_manage_neg.clicked.connect(self._manage_negatives)
+neg_row.addWidget(btn_manage_neg)
+form.addRow("", neg_row)  # replaces the standalone checkbox addRow
+```
+
+Add handler:
+
+```python
+def _manage_negatives(self):
+    dlg = HardNegativesDialog(self._db, self._profile, parent=self)
+    dlg.exec()
+    self._debounce.start()  # refresh stats after potential deletions
+```
+
+**Step 3: Test manually, commit**
+
+```bash
+pytest tests/ -v
+git add main.py
+git commit -m "feat: hard negatives management dialog with filter and bulk delete"
+```
+
+---
+
+### Task 6: Final integration test and push
+
+**Step 1: Manual test checklist**
+
+- [ ] Open Train dialog — verify no ghost folders appear
+- [ ] Train with "Use hard negatives" unchecked — verify training works
+- [ ] Train with "Use hard negatives" checked — verify negatives are used
+- [ ] Open Manage dialog — verify negatives listed with source model
+- [ ] Delete selected negatives — verify they're removed
+- [ ] Scan a video — verify results saved with timestamp
+- [ ] Rescan same video — verify version history appears
+- [ ] Switch version in scan panel — verify correct results display
+- [ ] Mark negative from scan results — verify source_model stored
+
+**Step 2: Push**
+
+```bash
+git push
+```