ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	83b1da9520	chore: remove all PrismAudio code from main branch - Delete prismaudio_core/, data_utils/, scripts/, docs/plans/ - Delete PrismAudio nodes (feature_extractor, feature_loader, model_loader, sampler, text_only) - Delete PrismAudio workflows (video_to_audio, text_to_audio) - Clean nodes/utils.py: rename PRISMAUDIO_CATEGORY → SELVA_CATEGORY, remove unused helpers - Strip PrismAudio-only deps from requirements.txt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 17:58:31 +02:00
Ethanfel	ab8e1e5b7b	feat: SelvaFeatureExtractor outputs prompt as STRING Users can now wire the prompt output directly to SelvaSampler's prompt input, making the data flow explicit instead of relying on the implicit features fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:27:49 +02:00
Ethanfel	27b4424e1a	feat: prompt entered once in SelvaFeatureExtractor, reused by SelvaSampler SelvaFeatureExtractor now stores the prompt in SELVA_FEATURES (both in the returned dict and the .npz cache). SelvaSampler's prompt is now optional — when left empty it falls back to the prompt stored in features. A non-empty override can still be passed when CLIP text should differ from the sync text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:22:59 +02:00
Ethanfel	6474e2816c	fix: two bugs in SelVA nodes - selva_feature_extractor: cache hash now includes resolved duration; same video + different duration override no longer returns stale features - selva_sampler: MPS-safe noise generation (torch.Generator on CPU then move to device, same pattern as PrismAudioSampler) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:39:57 +02:00
Ethanfel	578b501d38	feat: SelvaFeatureExtractor — inline CLIP + TextSynchformer feature extraction CLIP frames at 8fps→384px (normalize inside FeaturesUtils). Sync frames at 25fps→224px, normalized to [-1,1] externally. T5 text encoded via FeaturesUtils, sup tokens prepended, then text-conditioned sync features extracted via TextSynch.encode_video_with_sync(). Results cached as .npz keyed by hash(frames[:1MB] + prompt + fps + variant). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:23:40 +02:00

5 Commits