Mask Generation Design

Overview

Add per-frame PNG mask generation to the 8-cut export pipeline for SELVA dataset creation. Two methods are supported: Depth Anything V2 (fast, depth-based foreground) and SAM2 (accurate, propagated segmentation). Both run as isolated subprocesses inside a dedicated ML venv.

UI Changes

Settings dialog

A "Settings…" button added to the main window top bar opens a SettingsDialog (QDialog). It contains a "ML Tools" section with:

Status label: "Not installed" / "Installed"
Install button (becomes Reinstall once venv exists)
Read-only QPlainTextEdit streaming pip install output line-by-line

Mask generation row

A new row added to the main window below the export row:

QComboBox: "Depth Anything" / "SAM"
Generate Masks button — disabled until venv is installed
Operates on self._last_export_path (set after each successful export)
Output folder: <clip_stem>_masks/ next to the video
Frames named frame_0000.png, frame_0001.png, …
Progress streamed to status bar; button disabled during run

Venv

Path: ~/.8cut/venv/

Install command (run as subprocess, output streamed to dialog):

~/.8cut/venv/bin/pip install torch torchvision transformers opencv-python Pillow segment-anything-2

Setup steps:

python -m venv ~/.8cut/venv
~/.8cut/venv/bin/pip install --upgrade pip
~/.8cut/venv/bin/pip install torch torchvision transformers opencv-python Pillow segment-anything-2

Venv existence check: Path("~/.8cut/venv/bin/python").expanduser().exists().

tools/depth_masks.py

CLI: python tools/depth_masks.py --input <video.mp4> --output <dir>

Extract frames with OpenCV (cv2.VideoCapture)
Load transformers depth-estimation pipeline: depth-anything/Depth-Anything-V2-Large-hf, device=cuda if available else cpu
For each frame:
- Run depth estimation → float depth array
- Normalise to 0–255
- Apply Otsu threshold (cv2.threshold(..., cv2.THRESH_OTSU)) → binary mask
- Save as frame_NNNN.png
- Print frame N/total to stdout
Exit 0 on success, non-zero on error

tools/sam_masks.py

CLI: python tools/sam_masks.py --input <video.mp4> --output <dir>

Extract all frames to a temp directory with OpenCV
Load SAM2 video predictor (sam2.build_sam.build_sam2_video_predictor), checkpoint auto-downloaded from HuggingFace
Init predictor on the frame directory
Add point prompt: center pixel of frame 0 as positive point [[w//2, h//2]], label [1]
Propagate masks across all frames (propagate_in_video)
For each frame: extract binary mask, save as frame_NNNN.png
Print frame N/total to stdout

MaskWorker

MaskWorker(QThread) mirrors ExportWorker:

__init__(script: str, input_path: str, output_dir: str, venv_python: str)
run(): calls subprocess.Popen([venv_python, script, "--input", input_path, "--output", output_dir]), reads stdout line-by-line, emits progress = pyqtSignal(str), emits finished = pyqtSignal() or error = pyqtSignal(str)

MainWindow wiring

self._last_export_path: str = "" — set in _on_export_done
_on_generate_masks(): builds output dir path, creates it, instantiates MaskWorker with the selected script
Venv installed check on startup: if venv exists, enable the Generate Masks button

3.3 KiB Raw Blame History Unescape Escape