Commit Graph

3 Commits

Author SHA1 Message Date
Ethanfel 63bd999dfa fix: switch to VideoPrism large (1024-dim) and fix Synchformer output shape
prismaudio.json conditioner config requires:
- video_features: dim=1024 → switch videoprism_public_v1_base → large (ViT-L)
- sync_features: dim=768, length divisible by 8 → expand [num_seg,768] to
  [num_seg*8,768] (per-frame) so Sync_MLP can reshape by groups of 8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:07:17 +01:00
Ethanfel b1a2ee594e fix: correct VideoPrism import (videoprism.models, not videoprism); add flax dep
videoprism/__init__.py is empty — API lives in videoprism.models.
Fix: from videoprism import models as vp (not import videoprism as vp).
Also add flax to managed venv packages (required by videoprism Flax model).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:38:00 +01:00
Ethanfel 878025450a feat: add data_utils package with FeaturesUtils implementation
Creates data_utils/v2a_utils/feature_utils_288.py with FeaturesUtils:
- T5-Gemma text encoding via transformers
- VideoPrism video encoding via JAX videoprism package
- Synchformer visual encoder loading from checkpoint

Also fixes extract_features.py to add plugin root to sys.path so
data_utils is importable in the subprocess venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:14:34 +01:00