- selva_feature_extractor: cache hash now includes resolved duration;
same video + different duration override no longer returns stale features
- selva_sampler: MPS-safe noise generation (torch.Generator on CPU then
move to device, same pattern as PrismAudioSampler)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLIP frames at 8fps→384px (normalize inside FeaturesUtils).
Sync frames at 25fps→224px, normalized to [-1,1] externally.
T5 text encoded via FeaturesUtils, sup tokens prepended, then text-conditioned
sync features extracted via TextSynch.encode_video_with_sync(). Results cached
as .npz keyed by hash(frames[:1MB] + prompt + fps + variant).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>