T

Ethanfel eab5c690c7 feat: audio area length — remove the upper cap + step by 1s

The audio extract length is meant for visualizing/grabbing sequences that can
run minutes long, but the control capped it and stepped in fiddly 0.10s
increments. Raise the range to effectively unlimited (24h; ffmpeg stops cleanly
at end-of-file if the source is shorter) and make the arrows step 1s — typing
still allows sub-second precision. Widen the field for the larger values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-07-02 01:11:57 +02:00

.github/workflows

ci: disable auto Docker build, manual dispatch only

2026-04-16 15:23:32 +02:00

assets

feat: logo-based icon set + accent aligned to brand palette

2026-06-13 16:56:54 +02:00

core

fix: audio extract honored a silent length clamp — 30s near the end became 3s

2026-07-02 00:07:35 +02:00

docs/plans

docs: LTX-2 per-tab export mode implementation plan

2026-06-18 14:55:05 +02:00

tests

feat: audio area length — remove the upper cap + step by 1s

2026-07-02 01:11:57 +02:00

tools

remove: mask generation, venv setup, and settings dialog

2026-04-12 15:53:31 +02:00

.gitignore

feat: scan results panel, model switching, batch scan, and training improvements

2026-04-18 16:12:52 +02:00

8cut_calibrate.py

feat: integrate training UI, BEATs model, and clean up legacy code

2026-04-18 11:52:27 +02:00

8cut_train.py

feat: integrate training UI, BEATs model, and clean up legacy code

2026-04-18 11:52:27 +02:00

8cut.bat

feat: prefetch audio during Scan All, fix file-switch interruption, fix Windows setup

2026-04-18 21:50:33 +02:00

8cut.sh

feat: speech detection, format export buttons, subcategory controls, crop overlay during playback

2026-05-14 18:23:43 +02:00

8cut.spec

feat: logo-based icon set + accent aligned to brand palette

2026-06-13 16:56:54 +02:00

conftest.py

test: add MainWindow structure smoke test (skips headless)

2026-06-13 11:22:49 +02:00

LICENSE

docs: add GPL v3 license and GitHub source link

2026-04-07 12:34:13 +02:00

main.py

feat: audio area length — remove the upper cap + step by 1s

2026-07-02 01:11:57 +02:00

README.md

docs: changelog + README for LTX-2 mode + tab features (v1.2)

2026-06-18 15:16:18 +02:00

requirements.txt

fix: pin transformers<5.0 for EAT model compatibility

2026-04-19 16:11:18 +02:00

setup_env.sh

fix: ensure setup scripts use correct PyTorch index for transitive deps

2026-04-19 16:21:36 +02:00

setup-windows.ps1

fix: ensure setup scripts use correct PyTorch index for transitive deps

2026-04-19 16:21:36 +02:00

README.md

8-cut

A desktop tool for cutting 8-second clips from video files, designed for building foley datasets. Includes audio classification for automated scanning and batch export.

Overview

8-cut lets you scrub through a video, mark a cut point, and export a batch of overlapping 8-second clips with one keypress. It tracks every export in a local SQLite database so you can resume a session without duplicating work.

All clips are exactly 8 seconds — the standard length for foley sound datasets.

Features

Clip export

Frame-accurate scrubbing — click or drag the timeline; arrow keys and J/L for frame-by-frame, Shift for 1-second steps
Batch export — export multiple overlapping clips per cut point with configurable count and spread offset
Two export formats — H.264 MP4 with lossless PCM audio, or WebP image sequence (frames + .wav)
Portrait crop — crop to 9:16, 4:5, or 1:1 before export; click the video or crop bar to reposition
Random portrait/square — optionally apply a random crop to a subset of each batch
Resize — scale short side to a fixed pixel size (e.g. 512)
Hardware encoding — GPU-accelerated export via NVENC, VAAPI, QSV, AMF, or VideoToolbox
Subject tracking — auto-adjust crop center using YOLOv8 detection (optional)

Audio scanning

Embedding models — WAV2VEC2 (base/large), HuBERT (base/large/xlarge), BEATs
Train classifier — train a gradient boosting classifier on your exported clips to find similar audio
Scan video — detect regions matching your trained model with configurable threshold
Scan All — batch scan every video in the playlist
Region fusion — merge overlapping detections into contiguous regions
Hard negatives — mark false positives to refine training
Model versioning — timestamped backups with rollback support
Scan export — batch export from scan results with spread and minimum duration filtering

Scan results panel

Tabbed results — one tab per model, showing start/end/score per region
Disable regions — Delete/Backspace toggles regions off (greyed out, excluded from export) without removing them
Resize regions — double-click Time or End cells to edit, or drag region edges directly on the timeline
Grey ghost — trimmed portions of resized regions shown as grey overlay on timeline
Undo — Ctrl+Z reverts the last disable, resize, drag, or negative toggle

Organization

Sound annotation — label and category fields saved to the clip database and dataset.json
Export history — timeline markers show previously exported clips; double-click to overwrite; right-click to delete
Playlist — drag-and-drop video queue with progress tracking
Profiles — switch between independent marker sets (e.g. "landscape" vs "portrait")
Subprofiles — lightweight export folder variants for multiple output targets
Review mode — clean timeline view for navigating scan results without export clutter

Interface

Menu bar — File / Edit / Scan / View / Help hold the occasional actions (open files, train, scan all, profiles); the profile selector and shortcuts (?) sit in the top-right corner
Control deck — a compact tabbed panel under the video groups the settings into Export (label, name, folder, format, resize, duration/clips/spread, workers), Crop & Track, and Scan (model, threshold, fuse, scan/auto/speech/review)
Side-by-side panels — pin deck panels to view them as resizable columns: right-click a deck tab → Show side-by-side, or toggle them under View ▸ Side-by-side panels; drag the dividers to reallocate space, and the layout persists between sessions
Per-tab export folder — each file-list tab remembers its own output folder; switching tabs follows that tab's folder, and a guardrail warns when the loaded video doesn't match the destination
Duplicate tab — right-click a file-list tab → Duplicate tab to clone its files into a new tab with its own export folder
LTX-2 export mode — per-tab Foley | LTX-2 toggle (right-click a tab, shown with an [LTX2] badge): LTX-2 clips are frame-exact (frames % 8 == 1), forced to 25 fps, and center-cropped so width & height are divisible by 32 — for LTX-2 video-to-audio datasets; applies to manual, re-export, and auto-export
Status bar — export/scan progress and messages, with the current file · profile · worker count always shown

Keyboard shortcuts

Key	Action
`Left` / `J`	Step back 1 frame
`Right` / `L`	Step forward 1 frame
`Shift+Left` / `Shift+J`	Step back 1 second
`Shift+Right` / `Shift+L`	Step forward 1 second
`Space` / `P`	Toggle play/pause
`K`	Pause and snap to cursor
`E`	Export
`M`	Jump to next marker (wraps)
`N`	Next file in playlist
`G`	Toggle cursor lock
`Delete` / `Backspace`	Toggle disable on selected scan regions
`Ctrl+Z`	Undo last scan panel action
`?` / `F1`	Show keyboard shortcuts

Shortcuts are suppressed when a text field has focus.

Installation

Prerequisites

Python 3.11+ — python.org/downloads
ffmpeg — video encoding
libmpv — video playback

Quick start (all platforms)

The setup script creates a virtual environment and installs everything including PyTorch with CUDA support:

# Linux / macOS
./setup_env.sh

# Windows (PowerShell)
powershell -ExecutionPolicy Bypass -File setup-windows.ps1

Then run:

# Linux / macOS
./8cut.sh

# Windows
8cut.bat

The launch scripts auto-detect your venv or conda environment.

Manual installation

1. Install system dependencies

Linux (Arch):

pacman -S python mpv ffmpeg

Linux (Debian/Ubuntu):

apt install python3 python3-venv libmpv-dev ffmpeg

Windows:

# ffmpeg
winget install ffmpeg

# libmpv — download mpv-2.dll and place next to main.py
# https://sourceforge.net/projects/mpv-player-windows/files/libmpv/

macOS:

brew install python mpv ffmpeg

2. Create a virtual environment

A virtual environment keeps 8-cut's dependencies isolated from your system Python. This is strongly recommended — PyTorch alone is several GB and can conflict with other projects.

Using venv (recommended):

# Create the venv in the project directory
python3 -m venv .venv

# Activate it
# Linux / macOS:
source .venv/bin/activate
# Windows (cmd):
.venv\Scripts\activate.bat
# Windows (PowerShell):
.venv\Scripts\Activate.ps1

Using conda / miniforge:

conda create -n 8cut python=3.12
conda activate 8cut

You must activate the environment every time you open a new terminal before running 8-cut. The 8cut.sh launcher does this automatically.

3. Install PyTorch

PyTorch must be installed separately with the correct CUDA version for GPU acceleration. Without CUDA, audio scanning will fall back to CPU (much slower).

With NVIDIA GPU (CUDA 12.8):

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

CPU only (no GPU):

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

Check available CUDA versions at pytorch.org/get-started.

4. Install project dependencies

pip install -r requirements.txt

5. Verify

python -c "import torch; print('PyTorch', torch.__version__, 'CUDA', torch.version.cuda)"
python -c "import librosa, torchaudio, sklearn; print('All imports OK')"

Running

# With venv activated:
python main.py

# Or use the launcher (auto-activates venv/conda):
./8cut.sh        # Linux / macOS
8cut.bat         # Windows

GPU encoding

Hardware encoders are auto-detected from ffmpeg. Available encoders by platform:

Platform	Encoders
Linux	`h264_nvenc` (NVIDIA), `h264_vaapi` (AMD/Intel), `h264_qsv` (Intel)
Windows	`h264_nvenc` (NVIDIA), `h264_qsv` (Intel), `h264_amf` (AMD)
macOS	`h264_videotoolbox`

Enable the HW checkbox in the export controls to use GPU encoding.

Optional: audio scanning

Audio scanning requires PyTorch (installed above). Embedding models are downloaded on first use and cached in cache/downloads/. A CUDA-capable GPU is strongly recommended for training and scanning speed.

Usage

Drop videos onto the queue or click + Open Files. Scrub to your cut point, then press Export (or E).

Export layout

Each export creates a group subfolder containing the overlapping sub-clips:

output/
  clip_001/
    clip_001_0.mp4      # starts at cursor
    clip_001_1.mp4      # starts at cursor + spread
    clip_001_2.mp4      # starts at cursor + 2 * spread
  clip_002/
    ...

With WebP sequence format, each sub-clip becomes a directory of frames plus a .wav:

output/
  clip_001/
    clip_001_0/
      frame_0001.webp
      frame_0002.webp
      ...
    clip_001_0.wav

Scan export layout

Scan exports create one group folder per detected area:

output/
  clip_037/
    clip_037_a1_0.mp4   # area 1, clip 0
    clip_037_a1_1.mp4   # area 1, clip 1
  clip_038/
    clip_038_a2_0.mp4   # area 2, clip 0
    ...

Sound annotation

Set a Label (e.g. "dog barking") and Category (Human / Animal / Vehicle / Tool / Music / Nature / Sport / Other) before exporting. These are saved to:

dataset.json in the export folder — one entry per clip with path and label
The SQLite database — label and category, for recall when you revisit a marker

Labels persist between exports so you can cut many clips of the same class without retyping.

Overwrite and delete

Double-click a timeline marker to enter overwrite mode — the next export re-encodes all clips in that group to their original paths
Right-click a marker to delete it from the database
The Delete button removes all clips in a group from disk, database, and dataset.json

Audio scan workflow

1. Build a dataset

Export clips manually from several videos. Clips from the same export folder (e.g. mp4_Intense) become your positive training class.

Minimum dataset: ~20 clips from 2–3 different videos. This is enough for the classifier to learn a basic boundary, but expect noisy results — you'll need to mark hard negatives and retrain.

Ideal dataset: 50–100+ clips from 5+ videos covering the full range of variation in your target sound (different recording conditions, distances, intensities). More variety in your positives makes the model generalize better to unseen footage. Negatives are sampled automatically from regions far from your markers, but adding explicit negatives of confusable sounds (e.g. thunder when training for explosions) significantly reduces false positives.

The classifier improves iteratively: export a small initial set → train → scan → mark false positives as hard negatives → retrain. Each cycle sharpens the decision boundary without needing a large upfront dataset.

2. Train a classifier

Click Train to open the training dialog:

Positive class — select the export folder containing your target sounds
Negative class — optional explicit negatives, or leave as "(auto only)" for automatic sampling
Model — embedding model to use (HuBERT XLARGE recommended)
Auto-neg margin — distance from markers to sample automatic negatives (30s default)
Include scan-exported clips — whether to include previously scan-exported clips in training

The classifier trains a HistGradientBoostingClassifier on audio embeddings and saves to models/.

3. Scan videos

Select a trained model from the dropdown and click Scan. Adjust the threshold slider to control sensitivity. Detected regions appear as colored bands on the timeline and as rows in the results panel.

Audio embeddings are computed once per video and cached to disk (cache/w2v/). Subsequent scans with the same embedding model skip the GPU entirely and only re-run the classifier, which takes milliseconds. This makes the retrain → rescan loop nearly free after the first pass.

4. Review and refine

Toggle Review mode for a clean timeline focused on scan results
Disable false positive regions (Delete key) — they stay in the list but are excluded from export
Resize regions by dragging edges on the timeline or editing times in the table
Mark as negative — add false positives to the hard negative set for retraining
Ctrl+Z to undo any of the above

5. Export results

Click Export Scan Results to batch export all enabled regions. The button shows the estimated clip count based on spread and minimum duration settings.

6. Retrain with feedback

Train again — hard negatives are automatically included. Each training run saves with a timestamp. Click the ⏲ button next to the model dropdown to restore a previous version if results degrade — restoring automatically rescans with the selected version.

Database

Export history is stored in ~/.8cut.db (SQLite). Tables:

Table	Purpose
`processed`	Every exported clip with full encoding settings
`scan_results`	Audio scan detections per video/model
`hard_negatives`	Timestamps marked as false positives for training
`hidden_files`	Playlist files hidden by the user

The database auto-migrates when new columns are added.

File locations

Path	Contents
`~/.8cut.db`	SQLite database
`models/`	Trained classifier models (`.joblib`)
`cache/w2v/`	Embedding cache (`.npz`, keyed by video hash)
`cache/downloads/`	Downloaded pretrained models

Testing

pytest tests/ -v

46 unit tests covering path builders, ffmpeg command generation, time formatting, database operations, group queries, profile isolation, and annotation handling.

License

GNU General Public License v3.0

README.md Unescape Escape

8-cut

Overview

Features

Clip export

Audio scanning

Scan results panel

Organization

Interface

Keyboard shortcuts

Installation

Prerequisites

Quick start (all platforms)

Manual installation

1. Install system dependencies

2. Create a virtual environment

3. Install PyTorch

4. Install project dependencies

5. Verify

Running

GPU encoding

Optional: audio scanning

Usage

Export layout

Scan export layout

Sound annotation

Overwrite and delete

Audio scan workflow

1. Build a dataset

2. Train a classifier

3. Scan videos

4. Review and refine

5. Export results

6. Retrain with feedback

Database

File locations

Testing

License

README.md