Commit Graph

11 Commits

Author SHA1 Message Date
Ethanfel 056a7b973d fix: enable VAE encoder in model loader — required for DITTO reference encoding
need_vae_encoder=False was deleting the encoder to save a small amount of VRAM.
DITTO now needs it to encode reference clips to latent space for style loss.
The spectrogram VAE encoder is small enough that the overhead is negligible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:15:27 +02:00
Ethanfel bd53744e2d feat: comprehensive node improvements
Model Loader:
- bf16 support check — auto-falls back to fp16 on unsupported GPUs
- DESCRIPTION and OUTPUT_TOOLTIPS

Feature Extractor:
- Store variant in features dict and .npz cache
- Progress bar (3 steps: CLIP encode, T5 encode, sync encode)
- Expand cache hash to 32 hex chars
- DESCRIPTION and OUTPUT_TOOLTIPS

Sampler:
- Variant mismatch validation against extracted features
- Cancellation support via throw_exception_if_processing_interrupted()
- OOM catch with actionable error message
- normalize toggle (optional BOOLEAN, default true) for peak normalization
- Remove empty optional: {} block
- DESCRIPTION and OUTPUT_TOOLTIPS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:16:03 +02:00
Ethanfel 429810db5b docs: improve tooltips on all three SelVA nodes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:10:05 +02:00
Ethanfel 83b1da9520 chore: remove all PrismAudio code from main branch
- Delete prismaudio_core/, data_utils/, scripts/, docs/plans/
- Delete PrismAudio nodes (feature_extractor, feature_loader, model_loader, sampler, text_only)
- Delete PrismAudio workflows (video_to_audio, text_to_audio)
- Clean nodes/utils.py: rename PRISMAUDIO_CATEGORY → SELVA_CATEGORY, remove unused helpers
- Strip PrismAudio-only deps from requirements.txt

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 17:58:31 +02:00
Ethanfel 2c9d521565 fix: 44k generator HF paths use 44khz suffix (not 44k)
Actual filenames in jnwnlee/SelVA: generator_*_44khz_sup_5.pth.
download_utils.py had the wrong names so those MD5s are unverified — set to
None to skip MD5 check for 44k generators. All other files verified/unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:46:20 +02:00
Ethanfel 28229d62ce fix: MD5 validation on existing files — re-download if corrupt
Previously _ensure() trusted any existing file. Files downloaded by the
broken requests-based code (HTML error pages) would be silently reused.
Now checks MD5 on every load; deletes and re-downloads on mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:42:38 +02:00
Ethanfel 92593189f0 fix: use huggingface_hub for downloads instead of raw requests
download_utils.py used requests without auth — jnwnlee/SelVA returned an
HTML error page which torch then failed to unpickle ('E' / opcode 69).
huggingface_hub.hf_hub_download() handles HF_TOKEN auth automatically,
validates downloads, and retries. Files are still copied to models/selva/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:41:29 +02:00
Ethanfel 614a2e02aa fix: weights_only=False for SelVA checkpoints (PyTorch 2.6 compat)
PyTorch 2.6 changed the default to weights_only=True. SelVA checkpoints
contain non-tensor types (numpy scalars etc.) that fail strict unpickling.
All weights come from trusted sources (jnwnlee/selva HF repo).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:38:31 +02:00
Ethanfel 40388ba6de fix: negative_prompt inline (multiline:false) + VAE filename v1-44.pth not v1-44k.pth
- SelvaSampler: multiline:false puts negative_prompt inline above sliders
- SelvaModelLoader: VAE filenames in download_utils are v1-16.pth/v1-44.pth,
  not v1-{mode}.pth (mode includes the 'k' suffix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:35:17 +02:00
Ethanfel 9a985499e7 feat: auto-download SelVA weights on first use
Uses selva_core/utils/download_utils.py (already has URLs + MD5s for all
weights). Models download to models/selva/ on first load. Synchformer reuses
models/prismaudio/synchformer_state_dict.pth if already present (no duplicate
download for PrismAudio users), otherwise downloads to models/selva/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:25:36 +02:00
Ethanfel fe94438356 feat: SelvaModelLoader node — loads TextSynch + MMAudio + FeaturesUtils
Resolves weights from models/selva/. Reuses synchformer_state_dict.pth from
models/prismaudio/ (no duplicate download). Supports four variants:
small_16k / small_44k / medium_44k / large_44k.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 15:21:03 +02:00