From e7b791fbfa6101c1bd94e840feb8a1cb11c49b8c Mon Sep 17 00:00:00 2001 From: Ethanfel Date: Sun, 19 Apr 2026 14:51:17 +0200 Subject: [PATCH] docs: add scan history & hard negative management design + plan Covers scan result versioning per model, hard negative management dialog with training toggle, and ghost folder fix. Co-Authored-By: Claude Opus 4.6 --- ...026-04-19-scan-history-negatives-design.md | 90 +++ ...9-scan-history-negatives-implementation.md | 714 ++++++++++++++++++ 2 files changed, 804 insertions(+) create mode 100644 docs/plans/2026-04-19-scan-history-negatives-design.md create mode 100644 docs/plans/2026-04-19-scan-history-negatives-implementation.md diff --git a/docs/plans/2026-04-19-scan-history-negatives-design.md b/docs/plans/2026-04-19-scan-history-negatives-design.md new file mode 100644 index 0000000..5b7410b --- /dev/null +++ b/docs/plans/2026-04-19-scan-history-negatives-design.md @@ -0,0 +1,90 @@ +# Scan History & Hard Negative Management Design + +Date: 2026-04-19 + +## Goal + +1. Keep scan result history per `(file, model)` so users can track classifier improvement across training iterations +2. Make hard negatives manageable — viewable, removable, and optionally disabled per training run +3. Fix latent bug: `get_export_folders()` doesn't filter by `scan_export` + +## 1. Scan Result History + +### Current behavior + +`save_scan_results()` **replaces** all results for `(filename, profile, model)` on every scan. No history is preserved. + +### Change + +Keep the last N scan results per `(filename, profile, model)` with timestamps. The most recent is the "active" result displayed in the panel; older versions are accessible for comparison. + +### Schema change + +Add column to `scan_results`: + +```sql +ALTER TABLE scan_results ADD COLUMN scan_timestamp TEXT NOT NULL DEFAULT ''; +``` + +All rows from the same scan share the same timestamp string (e.g. `"20260419_143022"`). + +### save_scan_results changes + +Instead of `DELETE ... WHERE filename=? AND profile=? AND model=?`, the new flow: + +1. Insert new rows with current timestamp +2. Count distinct timestamps for this `(filename, profile, model)` +3. If count > N (default 5), delete rows belonging to the oldest timestamps + +### UI changes + +Add a small version dropdown/selector in `ScanResultsPanel` per model tab — shows timestamps of available scan versions. Selecting a version loads that version's results into the tab. The most recent is selected by default. + +The tab label shows the active version's region count, e.g. `HUBERT_XLARGE (12) [v3]`. + +### Cache interaction + +Embedding cache is per `(file, model)` and doesn't change across scans. Only the classifier output changes. History stores the classified regions (start, end, score), not embeddings. + +## 2. Hard Negative Management + +### Current behavior + +- Hard negatives stored in `hard_negatives` table: `(filename, profile, start_time, source_path)` +- No model column — applied globally within a profile +- Removable one-by-one via N toggle in scan panel, but no bulk management +- Always used in training — no way to disable + +### Changes + +#### Schema + +Add `source_model TEXT NOT NULL DEFAULT ''` column to `hard_negatives`. Populated when marking negatives from scan results (we know which model tab is active). + +#### Training toggle + +New checkbox in `TrainDialog`: **"Use hard negatives"** (default checked). When unchecked, `get_training_data()` skips the `hard_negatives` query entirely. Non-destructive — negatives remain in DB. + +#### Management dialog + +New `HardNegativesDialog` accessible from Train dialog via "Manage..." button next to the checkbox. Shows: + +- Table: filename, start time, source model, date added (if we add created_at) +- Filter by source model (dropdown) +- Multi-select + Delete button +- "Clear All" button with confirmation +- Count summary at top + +### Training integration + +`get_training_data()` gets a new `use_hard_negatives: bool = True` parameter. When False, the hard negatives query (lines 365-374 of db.py) is skipped entirely. + +## 3. Ghost Folder Fix + +### Bug + +`get_export_folders()` queries all `output_path` rows without filtering `scan_export`. Folders that only contain scan-exported clips appear in training dropdowns with 0 clips. + +### Fix + +Add `include_scan_exports` parameter to `get_export_folders()`. When False (default), only query rows with `scan_export = 0`. Also filter out folders with 0 clips from `get_training_stats()` result dict. diff --git a/docs/plans/2026-04-19-scan-history-negatives-implementation.md b/docs/plans/2026-04-19-scan-history-negatives-implementation.md new file mode 100644 index 0000000..0c33946 --- /dev/null +++ b/docs/plans/2026-04-19-scan-history-negatives-implementation.md @@ -0,0 +1,714 @@ +# Scan History & Hard Negative Management Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Add scan result versioning, hard negative management dialog with training toggle, and fix ghost folder bug. + +**Architecture:** DB schema changes in `core/db.py` (new columns, new queries). UI changes in `main.py` (version selector in ScanResultsPanel, management dialog, training toggle). No changes to `core/audio_scan.py`. + +**Tech Stack:** SQLite (existing), PyQt6 (existing) + +**Key design notes:** +- Scan history stores N versions per `(filename, profile, model)` using a `scan_timestamp` column. All rows from one scan share the same timestamp. +- Hard negatives gain a `source_model` column (informational) and training gains a `use_hard_negatives` toggle. +- `get_export_folders()` must respect `scan_export` filter to prevent ghost folders. + +--- + +### Task 1: Fix ghost folder bug in get_export_folders + +**Files:** +- Modify: `core/db.py:294-313` (get_export_folders) +- Modify: `core/db.py:410-443` (get_training_stats — filter out 0-clip folders) +- Test: `tests/test_db.py` + +**Step 1: Write failing test** + +```python +def test_export_folders_excludes_scan_exports(): + """Scan-export-only folders should not appear when include_scan_exports=False.""" + with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f: + path = f.name + try: + db = ProcessedDB(path) + # Manual export + db.add("a.mp4", 10.0, "/out/mp4_Intense/g1/clip.mp4", profile="test") + # Scan export to different folder + db.add("a.mp4", 20.0, "/out/mp4_ScanOnly/g1/clip.mp4", profile="test", + scan_export=True) + folders = db.get_export_folders("test") + assert "mp4_Intense" in folders + assert "mp4_ScanOnly" not in folders, "scan-only folder should be excluded" + # With include_scan_exports=True, both should appear + folders_all = db.get_export_folders("test", include_scan_exports=True) + assert "mp4_ScanOnly" in folders_all + finally: + os.unlink(path) +``` + +**Step 2: Fix get_export_folders** + +Add `include_scan_exports` parameter: + +```python +def get_export_folders(self, profile: str = "default", + include_scan_exports: bool = False) -> list[str]: + if not self._enabled: + return [] + if include_scan_exports: + rows = self._con.execute( + "SELECT DISTINCT output_path FROM processed WHERE profile = ?", + (profile,), + ).fetchall() + else: + rows = self._con.execute( + "SELECT DISTINCT output_path FROM processed" + " WHERE profile = ? AND scan_export = 0", + (profile,), + ).fetchall() + folder_names: set[str] = set() + for (op,) in rows: + grandparent = os.path.basename(os.path.dirname(os.path.dirname(op))) + if grandparent: + folder_names.add(grandparent) + return sorted(folder_names) +``` + +**Step 3: Update get_training_stats to pass through** + +```python + folders = self.get_export_folders(profile, include_scan_exports=include_scan_exports) +``` + +And filter out empty folders at the end: + +```python + return {k: v for k, v in stats.items() if v["clips"] > 0} +``` + +**Step 4: Run tests, commit** + +```bash +pytest tests/ -v +git add core/db.py tests/test_db.py +git commit -m "fix: get_export_folders respects scan_export filter" +``` + +--- + +### Task 2: Scan result history — schema and DB methods + +**Files:** +- Modify: `core/db.py:86-98` (scan_results schema — add scan_timestamp column) +- Modify: `core/db.py:100-113` (migration — add scan_timestamp to existing tables) +- Modify: `core/db.py:447-468` (save_scan_results — version management) +- Add: `core/db.py` (get_scan_versions, load_scan_version, delete_scan_version) +- Test: `tests/test_db.py` + +**Step 1: Write failing test** + +```python +def test_scan_result_history(): + """save_scan_results should keep multiple versions.""" + with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f: + path = f.name + try: + db = ProcessedDB(path) + # Save three versions + db.save_scan_results("v.mp4", "test", "MODEL_A", + [(0, 8, 0.9)]) + db.save_scan_results("v.mp4", "test", "MODEL_A", + [(0, 8, 0.8), (10, 18, 0.7)]) + db.save_scan_results("v.mp4", "test", "MODEL_A", + [(5, 13, 0.95)]) + versions = db.get_scan_versions("v.mp4", "test", "MODEL_A") + assert len(versions) == 3 + # Most recent first + assert versions[0]["count"] == 1 # latest: 1 region + assert versions[1]["count"] == 2 # middle: 2 regions + assert versions[2]["count"] == 1 # oldest: 1 region + # get_scan_results returns latest version by default + results = db.get_scan_results("v.mp4", "test") + assert len(results.get("MODEL_A", [])) == 1 + finally: + os.unlink(path) +``` + +**Step 2: Add scan_timestamp column** + +In the CREATE TABLE (line 87-98), add: + +```sql + scan_timestamp TEXT NOT NULL DEFAULT '' +``` + +In the migration block (lines 100-113), add: + +```python + ("scan_timestamp", "TEXT NOT NULL DEFAULT ''"), +``` + +**Step 3: Modify save_scan_results** + +Replace the current DELETE+INSERT with versioned insert + cleanup: + +```python +def save_scan_results(self, filename: str, profile: str, model: str, + regions: list[tuple[float, float, float]], + max_versions: int = 5) -> None: + if not self._enabled: + return + from datetime import datetime + ts = datetime.now().strftime("%Y%m%d_%H%M%S") + with self._lock: + self._con.executemany( + "INSERT INTO scan_results" + " (filename, profile, model, start_time, end_time, score," + " orig_start_time, orig_end_time, scan_timestamp)" + " VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", + [(filename, profile, model, s, e, sc, s, e, ts) + for s, e, sc in regions], + ) + # Prune old versions beyond max_versions + versions = self._con.execute( + "SELECT DISTINCT scan_timestamp FROM scan_results" + " WHERE filename = ? AND profile = ? AND model = ?" + " ORDER BY scan_timestamp DESC", + (filename, profile, model), + ).fetchall() + if len(versions) > max_versions: + old_ts = [v[0] for v in versions[max_versions:]] + self._con.execute( + "DELETE FROM scan_results" + " WHERE filename = ? AND profile = ? AND model = ?" + f" AND scan_timestamp IN ({','.join('?' * len(old_ts))})", + (filename, profile, model, *old_ts), + ) + self._con.commit() +``` + +**Step 4: Add get_scan_versions** + +```python +def get_scan_versions(self, filename: str, profile: str, model: str + ) -> list[dict]: + """Return list of scan versions for (filename, profile, model). + + Returns [{timestamp, count, max_score}, ...] ordered newest first. + """ + if not self._enabled: + return [] + rows = self._con.execute( + "SELECT scan_timestamp, COUNT(*), MAX(score)" + " FROM scan_results" + " WHERE filename = ? AND profile = ? AND model = ?" + " AND scan_timestamp != ''" + " GROUP BY scan_timestamp" + " ORDER BY scan_timestamp DESC", + (filename, profile, model), + ).fetchall() + return [{"timestamp": ts, "count": cnt, "max_score": sc} + for ts, cnt, sc in rows] +``` + +**Step 5: Modify get_scan_results to support version selection** + +Add optional `scan_timestamp` parameter. When None (default), returns latest version: + +```python +def get_scan_results(self, filename: str, profile: str, + scan_timestamp: str | None = None + ) -> dict[str, list[tuple]]: + if not self._enabled: + return {} + if scan_timestamp: + rows = self._con.execute( + "SELECT id, model, start_time, end_time, score, disabled," + " orig_start_time, orig_end_time" + " FROM scan_results" + " WHERE filename = ? AND profile = ? AND scan_timestamp = ?" + " ORDER BY model, start_time", + (filename, profile, scan_timestamp), + ).fetchall() + else: + # For each model, get rows from the latest timestamp only + rows = self._con.execute( + "SELECT r.id, r.model, r.start_time, r.end_time, r.score," + " r.disabled, r.orig_start_time, r.orig_end_time" + " FROM scan_results r" + " INNER JOIN (" + " SELECT model, MAX(scan_timestamp) AS latest" + " FROM scan_results" + " WHERE filename = ? AND profile = ?" + " GROUP BY model" + " ) m ON r.model = m.model AND r.scan_timestamp = m.latest" + " WHERE r.filename = ? AND r.profile = ?" + " ORDER BY r.model, r.start_time", + (filename, profile, filename, profile), + ).fetchall() + result: dict[str, list] = {} + for row_id, model, s, e, sc, dis, os_, oe in rows: + result.setdefault(model, []).append( + (row_id, s, e, sc, bool(dis), + os_ if os_ is not None else s, + oe if oe is not None else e)) + return result +``` + +**Important:** Legacy rows (before this change) have `scan_timestamp = ''`. The `MAX(scan_timestamp)` query handles this correctly — empty string sorts before any real timestamp, so legacy rows are returned when they're the only version. The `get_scan_versions` query filters `scan_timestamp != ''` so legacy rows don't appear as named versions. + +**Step 6: Run tests, commit** + +```bash +pytest tests/ -v +git add core/db.py tests/test_db.py +git commit -m "feat: scan result history — keep N versions per (file, model)" +``` + +--- + +### Task 3: Scan history UI — version selector in ScanResultsPanel + +**Files:** +- Modify: `main.py` (ScanResultsPanel — add version combo per tab) +- Modify: `main.py` (ScanResultsPanel.load_for_file — populate versions) + +**Step 1: Add version combo to tab UI** + +In `ScanResultsPanel._add_tab()`, add a small QComboBox above the table. When no history exists, hide it. When versions exist, populate with timestamps and connect to a slot that reloads the tab with that version. + +```python +# In _add_tab, create a container widget with version combo + table +container = QWidget() +layout = QVBoxLayout(container) +layout.setContentsMargins(0, 0, 0, 0) + +cmb_version = QComboBox() +cmb_version.setMaximumWidth(200) +cmb_version.setToolTip("Scan version history") +cmb_version.hide() # Hidden when only 1 version +layout.addWidget(cmb_version) +layout.addWidget(table) + +self._tabs.addTab(container, label) +``` + +Store the combo and table as properties on the container widget for later access. + +**Step 2: Populate versions in load_for_file** + +After creating each model tab, query `get_scan_versions()`. If > 1 version, show the combo with entries like `"2026-04-19 14:30 (12 regions, best: 0.95)"`. Connect `currentIndexChanged` to reload that version's results. + +**Step 3: Version switching slot** + +When user selects a different version from the combo: +1. Call `db.get_scan_results(filename, profile, scan_timestamp=selected_ts)` +2. Repopulate the table with that version's rows +3. Update timeline regions + +**Step 4: Test manually, commit** + +```bash +git add main.py +git commit -m "feat: scan version selector in results panel" +``` + +--- + +### Task 4: Hard negatives — schema and training toggle + +**Files:** +- Modify: `core/db.py:118-130` (hard_negatives schema — add source_model column) +- Modify: `core/db.py:548-560` (add_hard_negatives — accept source_model) +- Modify: `core/db.py:365-374` (get_training_data — use_hard_negatives parameter) +- Modify: `main.py` (TrainDialog — add "Use hard negatives" checkbox) +- Modify: `main.py` (_open_train_dialog — pass use_hard_negatives to get_training_data) +- Test: `tests/test_db.py` + +**Step 1: Write failing test** + +```python +def test_hard_negatives_source_model(): + """Hard negatives should store source_model.""" + with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f: + path = f.name + try: + db = ProcessedDB(path) + db.add_hard_negatives("a.mp4", "test", [10.0, 20.0], + source_path="/a.mp4", source_model="HUBERT_XLARGE") + rows = db.get_hard_negatives("test") + assert len(rows) == 2 + assert all(r["source_model"] == "HUBERT_XLARGE" for r in rows) + finally: + os.unlink(path) + +def test_training_data_skips_hard_negatives(): + """get_training_data with use_hard_negatives=False should skip them.""" + with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f: + path = f.name + try: + db = ProcessedDB(path) + db.add("a.mp4", 10.0, "/out/folder/g/clip.mp4", profile="test", + source_path="/videos/a.mp4") + db.add_hard_negatives("a.mp4", "test", [500.0], source_path="/videos/a.mp4") + # With hard negatives + data_with = db.get_training_data("test", "folder", use_hard_negatives=True) + # Without hard negatives + data_without = db.get_training_data("test", "folder", use_hard_negatives=False) + # Both should find the video, but negative counts differ + assert len(data_with) >= 1 + neg_with = sum(len(vi[3]) for vi in data_with) + neg_without = sum(len(vi[3]) for vi in data_without) + assert neg_with > neg_without or neg_with == neg_without # depends on margin + finally: + os.unlink(path) +``` + +**Step 2: Add source_model column to hard_negatives** + +In CREATE TABLE (line 119-125), add: + +```sql + source_model TEXT NOT NULL DEFAULT '' +``` + +In migration section, add after the hard_negatives table creation: + +```python +hn_cols = { + row[1] + for row in self._con.execute("PRAGMA table_info(hard_negatives)").fetchall() +} +if "source_model" not in hn_cols: + self._con.execute( + "ALTER TABLE hard_negatives ADD COLUMN source_model TEXT NOT NULL DEFAULT ''" + ) +``` + +**Step 3: Update add_hard_negatives to accept source_model** + +```python +def add_hard_negatives(self, filename: str, profile: str, + times: list[float], source_path: str = "", + source_model: str = "") -> None: + if not self._enabled or not times: + return + with self._lock: + for t in times: + self._con.execute( + "INSERT INTO hard_negatives" + " (filename, profile, start_time, source_path, source_model)" + " VALUES (?, ?, ?, ?, ?)", + (filename, profile, t, source_path, source_model), + ) + self._con.commit() +``` + +**Step 4: Add get_hard_negatives (full rows for management dialog)** + +```python +def get_hard_negatives(self, profile: str) -> list[dict]: + """Return all hard negatives for a profile with full details.""" + if not self._enabled: + return [] + rows = self._con.execute( + "SELECT id, filename, start_time, source_path, source_model" + " FROM hard_negatives WHERE profile = ?" + " ORDER BY filename, start_time", + (profile,), + ).fetchall() + return [{"id": r[0], "filename": r[1], "start_time": r[2], + "source_path": r[3], "source_model": r[4]} for r in rows] +``` + +**Step 5: Add delete_hard_negatives_by_ids** + +```python +def delete_hard_negatives_by_ids(self, ids: list[int]) -> None: + """Delete hard negatives by row IDs.""" + if not self._enabled or not ids: + return + with self._lock: + self._con.execute( + f"DELETE FROM hard_negatives WHERE id IN ({','.join('?' * len(ids))})", + ids, + ) + self._con.commit() +``` + +**Step 6: Add use_hard_negatives parameter to get_training_data** + +In `get_training_data()` (line 315), add parameter: + +```python +def get_training_data(self, profile: str, positive_folder: str, + negative_folder: str = "", + fallback_video_dir: str = "", + include_scan_exports: bool = False, + use_hard_negatives: bool = True, + ) -> list[tuple[str, list[float], list[float], list[float]]]: +``` + +Then wrap the hard negatives query (lines 365-374) in a conditional: + +```python + if use_hard_negatives: + hard_rows = self._con.execute( + "SELECT filename, start_time, source_path FROM hard_negatives" + " WHERE profile = ?", + (profile,), + ).fetchall() + for fn, st, sp in hard_rows: + neg_by_video.setdefault(fn, set()).add(st) + if sp: + source_by_filename.setdefault(fn, sp) +``` + +**Step 7: Pass source_model when marking negatives from scan panel** + +In `main.py`, `_on_scan_negatives()` needs to pass the current scan model. The scan panel knows which tab is active: + +```python +def _on_scan_negatives(self, times: list) -> None: + if not self._file_path: + return + filename = os.path.basename(self._file_path) + # Get current model tab name for source_model + source_model = self._scan_panel.current_model_name() + self._db.add_hard_negatives(filename, self._profile, times, + source_path=self._file_path, + source_model=source_model) +``` + +Add `current_model_name()` to ScanResultsPanel: + +```python +def current_model_name(self) -> str: + """Return the model name of the currently active tab.""" + idx = self._tabs.currentIndex() + if idx >= 0: + return self._tabs.tabText(idx).split(" (")[0] # strip count suffix + return "" +``` + +**Step 8: Add training toggle to TrainDialog** + +After the existing `_chk_scan_exports` checkbox: + +```python +self._chk_hard_negatives = QCheckBox("Use hard negatives in training") +self._chk_hard_negatives.setChecked(True) +self._chk_hard_negatives.setToolTip( + "When unchecked, manually marked hard negatives are excluded from training.\n" + "Useful when training a new model type where old negatives may not apply.") +self._chk_hard_negatives.stateChanged.connect(lambda: self._debounce.start()) +form.addRow("", self._chk_hard_negatives) +``` + +Add property: + +```python +@property +def use_hard_negatives(self) -> bool: + return self._chk_hard_negatives.isChecked() +``` + +**Step 9: Wire toggle through _open_train_dialog** + +In `_open_train_dialog()`, pass the flag: + +```python + video_infos = self._db.get_training_data( + self._profile, pos_folder, negative_folder=neg_folder, + fallback_video_dir=video_dir, + include_scan_exports=inc_scan, + use_hard_negatives=dlg.use_hard_negatives, + ) +``` + +Also update `_update_stats()` in TrainDialog to pass it through for accurate counts: + +```python + use_neg = self._chk_hard_negatives.isChecked() if hasattr(self, '_chk_hard_negatives') else True + video_infos = self._db.get_training_data( + self._profile, folder, negative_folder=neg_folder, + fallback_video_dir=self._txt_video_dir.text(), + include_scan_exports=inc_scan, + use_hard_negatives=use_neg, + ) +``` + +**Step 10: Run tests, commit** + +```bash +pytest tests/ -v +git add core/db.py main.py tests/test_db.py +git commit -m "feat: hard negative source_model tracking, training toggle" +``` + +--- + +### Task 5: Hard negatives management dialog + +**Files:** +- Modify: `main.py` (add HardNegativesDialog class) +- Modify: `main.py` (TrainDialog — add "Manage..." button) + +**Step 1: Create HardNegativesDialog** + +Place before TrainDialog class: + +```python +class HardNegativesDialog(QDialog): + """View and manage hard negative training examples.""" + + def __init__(self, db: ProcessedDB, profile: str, parent=None): + super().__init__(parent) + self.setWindowTitle("Hard Negatives") + self.setMinimumSize(600, 400) + self._db = db + self._profile = profile + + layout = QVBoxLayout(self) + + # Filter row + filter_row = QHBoxLayout() + filter_row.addWidget(QLabel("Filter model:")) + self._cmb_filter = QComboBox() + self._cmb_filter.addItem("(all)") + self._cmb_filter.currentIndexChanged.connect(self._apply_filter) + filter_row.addWidget(self._cmb_filter, 1) + layout.addLayout(filter_row) + + # Summary + self._lbl_summary = QLabel() + layout.addWidget(self._lbl_summary) + + # Table + self._table = QTableWidget(0, 4) + self._table.setHorizontalHeaderLabels( + ["File", "Time", "Source Model", "ID"]) + self._table.horizontalHeader().setSectionResizeMode( + 0, QHeaderView.ResizeMode.Stretch) + self._table.setEditTriggers(QTableWidget.EditTrigger.NoEditTriggers) + self._table.setSelectionBehavior(QTableWidget.SelectionBehavior.SelectRows) + self._table.setColumnHidden(3, True) # hide ID column + layout.addWidget(self._table) + + # Buttons + btn_row = QHBoxLayout() + btn_delete = QPushButton("Delete Selected") + btn_delete.clicked.connect(self._delete_selected) + btn_row.addWidget(btn_delete) + btn_clear = QPushButton("Clear All") + btn_clear.clicked.connect(self._clear_all) + btn_row.addWidget(btn_clear) + btn_row.addStretch() + btn_close = QPushButton("Close") + btn_close.clicked.connect(self.close) + btn_row.addWidget(btn_close) + layout.addLayout(btn_row) + + self._load() + + def _load(self): + rows = self._db.get_hard_negatives(self._profile) + models = sorted(set(r["source_model"] for r in rows if r["source_model"])) + self._cmb_filter.blockSignals(True) + self._cmb_filter.clear() + self._cmb_filter.addItem("(all)") + for m in models: + self._cmb_filter.addItem(m) + self._cmb_filter.blockSignals(False) + + self._table.setRowCount(len(rows)) + for i, r in enumerate(rows): + self._table.setItem(i, 0, QTableWidgetItem(r["filename"])) + self._table.setItem(i, 1, QTableWidgetItem(f'{r["start_time"]:.1f}s')) + self._table.setItem(i, 2, QTableWidgetItem(r["source_model"])) + item = QTableWidgetItem(str(r["id"])) + self._table.setItem(i, 3, item) + self._lbl_summary.setText(f"{len(rows)} hard negatives") + + def _apply_filter(self): + model = self._cmb_filter.currentText() + for row in range(self._table.rowCount()): + if model == "(all)": + self._table.setRowHidden(row, False) + else: + src = self._table.item(row, 2).text() + self._table.setRowHidden(row, src != model) + + def _delete_selected(self): + ids = [] + for row in sorted(set(i.row() for i in self._table.selectedItems()), reverse=True): + if not self._table.isRowHidden(row): + ids.append(int(self._table.item(row, 3).text())) + if ids: + self._db.delete_hard_negatives_by_ids(ids) + self._load() + + def _clear_all(self): + reply = QMessageBox.question( + self, "Clear All", + f"Delete all hard negatives for profile '{self._profile}'?", + QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No, + ) + if reply == QMessageBox.StandardButton.Yes: + all_rows = self._db.get_hard_negatives(self._profile) + self._db.delete_hard_negatives_by_ids([r["id"] for r in all_rows]) + self._load() +``` + +**Step 2: Add "Manage..." button to TrainDialog** + +After the hard negatives checkbox, add a button: + +```python +neg_row = QHBoxLayout() +neg_row.addWidget(self._chk_hard_negatives) +btn_manage_neg = QPushButton("Manage…") +btn_manage_neg.setFixedWidth(80) +btn_manage_neg.clicked.connect(self._manage_negatives) +neg_row.addWidget(btn_manage_neg) +form.addRow("", neg_row) # replaces the standalone checkbox addRow +``` + +Add handler: + +```python +def _manage_negatives(self): + dlg = HardNegativesDialog(self._db, self._profile, parent=self) + dlg.exec() + self._debounce.start() # refresh stats after potential deletions +``` + +**Step 3: Test manually, commit** + +```bash +pytest tests/ -v +git add main.py +git commit -m "feat: hard negatives management dialog with filter and bulk delete" +``` + +--- + +### Task 6: Final integration test and push + +**Step 1: Manual test checklist** + +- [ ] Open Train dialog — verify no ghost folders appear +- [ ] Train with "Use hard negatives" unchecked — verify training works +- [ ] Train with "Use hard negatives" checked — verify negatives are used +- [ ] Open Manage dialog — verify negatives listed with source model +- [ ] Delete selected negatives — verify they're removed +- [ ] Scan a video — verify results saved with timestamp +- [ ] Rescan same video — verify version history appears +- [ ] Switch version in scan panel — verify correct results display +- [ ] Mark negative from scan results — verify source_model stored + +**Step 2: Push** + +```bash +git push +```