Users can now wire the prompt output directly to SelvaSampler's prompt input,
making the data flow explicit instead of relying on the implicit features fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SelvaFeatureExtractor now stores the prompt in SELVA_FEATURES (both in the
returned dict and the .npz cache). SelvaSampler's prompt is now optional —
when left empty it falls back to the prompt stored in features. A non-empty
override can still be passed when CLIP text should differ from the sync text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- selva_feature_extractor: cache hash now includes resolved duration;
same video + different duration override no longer returns stale features
- selva_sampler: MPS-safe noise generation (torch.Generator on CPU then
move to device, same pattern as PrismAudioSampler)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLIP frames at 8fps→384px (normalize inside FeaturesUtils).
Sync frames at 25fps→224px, normalized to [-1,1] externally.
T5 text encoded via FeaturesUtils, sup tokens prepended, then text-conditioned
sync features extracted via TextSynch.encode_video_with_sync(). Results cached
as .npz keyed by hash(frames[:1MB] + prompt + fps + variant).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>