Commit Graph

7 Commits

Author SHA1 Message Date
Ethanfel e49f760b77 fix: feature extractor CUDA detection, cache correctness, and short-video crash
- Detect CUDA version at venv creation time and install matching jax[cuda12/13]
  instead of hardcoded jax[cuda13] — was broken on CUDA 12.x (most systems)
- Include fps in cache hash: same video+caption at different fps previously
  returned stale cached features with wrong frame sampling
- Guard frame index lists with max(1,...)/max(8,...) to prevent torch.stack([])
  crash on very short input clips; sync minimum is 8 to match Synchformer's
  segment size requirement
- Remove mediapy from managed venv packages — not imported anywhere
- Warn when caption_cot is empty (produces degenerate text features)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 16:00:05 +01:00
Ethanfel 707ccb463e perf: replace MP4 encode/decode with lossless .npy frame transfer
Saves frames as uint8 .npy instead of H.264 MP4, eliminating the
lossy codec roundtrip. extract_features.py loads .npy directly and
skips decord when given a numpy file. Passes --source_fps for
correct temporal sampling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:50:35 +01:00
Ethanfel ca87c41a2e feat: add per-step timing to feature extraction logs
Each step now prints elapsed seconds on completion.
Total time printed at the end to identify bottlenecks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:13:42 +01:00
Ethanfel 20fb766ad2 fix: cast tensors to float32 before numpy() in feature save
T5-Gemma outputs BFloat16 which numpy does not support.
Cast all feature tensors with .float() before .numpy().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:56:52 +01:00
Ethanfel 829f398ed0 feat: verbose step-by-step logging in feature extraction
- extract_features.py: 6 numbered steps with shapes, fps, frame counts
- feature_extractor.py: stream subprocess output live (capture_output=False)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:19:38 +01:00
Ethanfel 878025450a feat: add data_utils package with FeaturesUtils implementation
Creates data_utils/v2a_utils/feature_utils_288.py with FeaturesUtils:
- T5-Gemma text encoding via transformers
- VideoPrism video encoding via JAX videoprism package
- Synchformer visual encoder loading from checkpoint

Also fixes extract_features.py to add plugin root to sys.path so
data_utils is importable in the subprocess venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:14:34 +01:00
Ethanfel 7c54ee8482 feat: PrismAudioFeatureExtractor node with subprocess bridge and conda env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:06:10 +01:00