Files
8-cut/docs/plans/2026-06-18-ltx2-preset-design.md
Ethanfel 755f7e5131 docs: LTX-2 per-tab export mode design
Per-tab foley|ltx2 pipeline mode + "Duplicate as LTX-2". LTX-2: frame-exact
length (F%8==1), force 25fps, center-crop to ÷32. Soft preset, builds on the
per-tab export folder feature. core/ffmpeg gains optional target_fps/snap32.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-18 14:55:05 +02:00

4.9 KiB
Raw Permalink Blame History

LTX-2 per-tab export mode — Design

Goal: Add an export pipeline mode to each file-list tab — Foley (current behavior) or LTX-2 — so the same source videos can feed both a Foley dataset (8 s clips) and an LTX-2 V2A dataset (frame-exact, ÷32, 25 fps) without the two ever mixing.

Depends on: the per-tab export folder feature (branch tab-export-folder) — this design extends that per-tab state. Implementation branch ltx2-preset is based on it.

Scope: soft preset (no hard enforcement — defaults are LTX-2-legal but every control stays editable). core/ gains optional pipeline params; Foley path is byte-for-byte unchanged.


LTX-2 constraints (why this exists)

LTX-2 (32× spatial VAE, 8× temporal + 1) requires, for a clip:

  • W and H each divisible by 32.
  • Frame count F such that F % 8 == 1 → 9, 17, 25, … 201, … (transformer seq-len ∝ (W/32)·(H/32)·((F1)/8+1)).
  • fps only sets real duration F/fps; for V2A it fixes the paired-audio length and audio↔motion sync, so it must be consistent across the dataset and equal to the inference frame_rate. Target: 25 fps.
  • V2A video is frozen conditioning → low spatial res (384512) is fine and cheaper.

Note: 8 s @ 25 fps = 200 frames, and 200 % 8 == 08 s is not legal. Nearest legal: F=193 (7.72 s) or F=201 (8.04 s).


Model: per-tab mode

Each tab (PlaylistWidget) gains _mode ∈ {"foley","ltx2"}, persisted alongside _dest_folder/_pinned/_tab_folder in _save_playlist_tabs/_load_playlist_tabs. Default "foley" → existing tabs load unchanged. The active tab's mode drives the export pipeline and the length control.

Tab context menu (_DeckTabBar/_PlaylistTabBar)

  • Duplicate as LTX-2 — headline action: clone the tab's file list + separators into a new tab; set mode="ltx2"; derive a separate export folder "<dest_folder>_ltx2"; load LTX-2 default geometry. Lets you spin an LTX-2 dataset off a Foley working set.
  • Duplicate tab — clone keeping the same mode.
  • LTX-2 mode — checkable, flips an existing tab between foley/ltx2.
  • Tab label shows a small [LTX2] badge when mode=="ltx2".

What ltx2 mode changes (soft — still editable)

Aspect Foley LTX-2
Clip length Duration spinbox (seconds) Frame-count F control stepping the legal series (9, 17, …, 201, …); shows = F/25 s
Output fps inherits source forced 25 fps (resample; preserves duration/sync)
Output W×H short-side resize → even long side center-cropped to ÷32 on both axes (no aspect distortion; loses ≤31 px/side); resize default 512
Frame exactness duration-based exactly F frames (-frames:v F)

Defaults loaded on convert: resize 512, F = 201 (≈8.04 s, mirrors the 8 s Foley clips), ratio as set. All editable afterward.

Pipeline (core/ffmpeg.build_ffmpeg_command)

Add optional params; Foley calls pass none → identical output to today:

  • target_fps: float | None — when set, append fps={target_fps} filter and -r {target_fps}.
  • snap32: bool — when true, after the scale append a centered crop to the nearest lower multiple of 32 on each axis: crop=trunc(iw/32)*32:trunc(ih/32)*32.
  • Frame-exact length: caller computes duration = F/target_fps and passes -frames:v F on the video output so the clip has exactly F frames; audio extract uses the same F/target_fps duration so V2A pairing stays aligned.

Filter order: portrait-crop (aspect) → scale (short side, ÷32 default) → snap32 crop → fps. The snap32 center-crop runs after scaling so the ÷32 trim is on final pixels.

UI wiring (MainWindow)

  • The length spinbox area swaps with the active tab's mode: Foley shows Duration (s); LTX-2 shows Frames (F) with a live = s @25fps readout. Switching tabs (or toggling mode) reconfigures it; uses the existing _sync_folder_field_to_tab-style sync hook on tab change.
  • _on_export / _start_export_batch: when the active tab is ltx2, pass target_fps=25, snap32=True, and frame-exact length to the ffmpeg builder; otherwise unchanged.
  • The mismatch guardrail (just added) and per-tab folder continue to apply.

Persistence & migration

_mode added to each tab's saved JSON (default "foley" when absent). No DB changes. Existing sessions load every tab as Foley → zero behavior change until a tab is converted.

What this does NOT do

  • No hard enforcement: you can set an illegal F or non-÷32 resize manually; the pipeline still crops to ÷32 and uses whatever F you pick (the control defaults/steps keep you legal, but nothing blocks you).
  • No motion interpolation on fps resample (frame drop/dup only); keep sources native 25 fps where possible.
  • No change to Foley exports, the scan pipeline, or the DB schema.
  • No automatic re-export of existing clips into LTX-2 — you cut LTX-2 clips in the converted tab.