3.3 KiB
Mask Generation Design
Overview
Add per-frame PNG mask generation to the 8-cut export pipeline for SELVA dataset creation. Two methods are supported: Depth Anything V2 (fast, depth-based foreground) and SAM2 (accurate, propagated segmentation). Both run as isolated subprocesses inside a dedicated ML venv.
UI Changes
Settings dialog
A "Settings…" button added to the main window top bar opens a SettingsDialog (QDialog). It contains a "ML Tools" section with:
- Status label: "Not installed" / "Installed"
- Install button (becomes Reinstall once venv exists)
- Read-only
QPlainTextEditstreaming pip install output line-by-line
Mask generation row
A new row added to the main window below the export row:
QComboBox: "Depth Anything" / "SAM"- Generate Masks button — disabled until venv is installed
- Operates on
self._last_export_path(set after each successful export) - Output folder:
<clip_stem>_masks/next to the video - Frames named
frame_0000.png,frame_0001.png, … - Progress streamed to status bar; button disabled during run
Venv
Path: ~/.8cut/venv/
Install command (run as subprocess, output streamed to dialog):
~/.8cut/venv/bin/pip install torch torchvision transformers opencv-python Pillow segment-anything-2
Setup steps:
python -m venv ~/.8cut/venv~/.8cut/venv/bin/pip install --upgrade pip~/.8cut/venv/bin/pip install torch torchvision transformers opencv-python Pillow segment-anything-2
Venv existence check: Path("~/.8cut/venv/bin/python").expanduser().exists().
tools/depth_masks.py
CLI: python tools/depth_masks.py --input <video.mp4> --output <dir>
- Extract frames with OpenCV (
cv2.VideoCapture) - Load
transformersdepth-estimation pipeline:depth-anything/Depth-Anything-V2-Large-hf, device=cudaif available elsecpu - For each frame:
- Run depth estimation → float depth array
- Normalise to 0–255
- Apply Otsu threshold (
cv2.threshold(..., cv2.THRESH_OTSU)) → binary mask - Save as
frame_NNNN.png - Print
frame N/totalto stdout
- Exit 0 on success, non-zero on error
tools/sam_masks.py
CLI: python tools/sam_masks.py --input <video.mp4> --output <dir>
- Extract all frames to a temp directory with OpenCV
- Load SAM2 video predictor (
sam2.build_sam.build_sam2_video_predictor), checkpoint auto-downloaded from HuggingFace - Init predictor on the frame directory
- Add point prompt: center pixel of frame 0 as positive point
[[w//2, h//2]], label[1] - Propagate masks across all frames (
propagate_in_video) - For each frame: extract binary mask, save as
frame_NNNN.png - Print
frame N/totalto stdout
MaskWorker
MaskWorker(QThread) mirrors ExportWorker:
__init__(script: str, input_path: str, output_dir: str, venv_python: str)run(): callssubprocess.Popen([venv_python, script, "--input", input_path, "--output", output_dir]), reads stdout line-by-line, emitsprogress = pyqtSignal(str), emitsfinished = pyqtSignal()orerror = pyqtSignal(str)
MainWindow wiring
self._last_export_path: str = ""— set in_on_export_done_on_generate_masks(): builds output dir path, creates it, instantiatesMaskWorkerwith the selected script- Venv installed check on startup: if venv exists, enable the Generate Masks button