- nodes/__init__.py: fix [PrismAudio] leftover label in error print
- selva_feature_extractor: hash beginning, middle and end of video tensor
instead of just first 1MB, avoiding collisions on videos with same opening frames
- selva_sampler: derive SequenceConfig from model template via dataclasses.replace
instead of hardcoding sampling_rate/spectrogram_frame_rate per mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users can now wire the prompt output directly to SelvaSampler's prompt input,
making the data flow explicit instead of relying on the implicit features fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SelvaFeatureExtractor now stores the prompt in SELVA_FEATURES (both in the
returned dict and the .npz cache). SelvaSampler's prompt is now optional —
when left empty it falls back to the prompt stored in features. A non-empty
override can still be passed when CLIP text should differ from the sync text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- selva_feature_extractor: cache hash now includes resolved duration;
same video + different duration override no longer returns stale features
- selva_sampler: MPS-safe noise generation (torch.Generator on CPU then
move to device, same pattern as PrismAudioSampler)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLIP frames at 8fps→384px (normalize inside FeaturesUtils).
Sync frames at 25fps→224px, normalized to [-1,1] externally.
T5 text encoded via FeaturesUtils, sup tokens prepended, then text-conditioned
sync features extracted via TextSynch.encode_video_with_sync(). Results cached
as .npz keyed by hash(frames[:1MB] + prompt + fps + variant).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>