Commit Graph

6 Commits

Author SHA1 Message Date
Ethanfel 437c62b28f feat: LoRA fine-tuning for SelVA generator
Teaches the model new/partial sound classes from custom video+audio pairs.
Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model.

selva_core/model/lora.py
  LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices.
  B initialised to zero → zero adapter contribution at init.
  apply_lora(): walks named_modules, replaces matching nn.Linear in-place.
  Default target: "attn.qkv" (all 21 SelfAttention QKV projections in
  large_44k). Add "linear1" to also wrap post-attention output projections.
  get_lora_state_dict() / load_lora() for ~10 MB save/load.

train_lora.py (standalone script, no ComfyUI dependency)
  Data format: directory of video files + optional prompts.txt
  ("filename: description"). Falls back to directory name as prompt.
  Pre-extracts features for all clips into RAM, then trains from those.
  Training loop: encode audio→latent (need_vae_encoder=True), flow
  matching MSE loss on velocity prediction, backward on LoRA params only.
  Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata.
  Key verified interfaces used:
    encode_audio() → DiagonalGaussianDistribution; .mode().clone() required
    normalize() is in-place
    forward(latent, clip_f, sync_f, text_f, t) takes raw tensors

nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node)
  Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights.
  strength param scales lora_B to adjust adapter contribution at inference.
  Reads rank/alpha/target from embedded metadata if present.
  Returns a patched SELVA_MODEL bundle for use with the existing Sampler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:38:46 +02:00
Ethanfel b4124f58b3 fix: BigVGANv2._from_pretrained() compat with newer huggingface_hub
Newer hf_hub stopped passing proxies/resume_download/local_files_only/token
to _from_pretrained(). Give them defaults so the call doesn't fail when
these kwargs are omitted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:51:48 +02:00
Ethanfel 614a2e02aa fix: weights_only=False for SelVA checkpoints (PyTorch 2.6 compat)
PyTorch 2.6 changed the default to weights_only=True. SelVA checkpoints
contain non-tensor types (numpy scalars etc.) that fail strict unpickling.
All weights come from trusted sources (jnwnlee/selva HF repo).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:38:31 +02:00
Ethanfel 4da4858e4a fix: inline prune helpers when removed from both transformers locations
find_pruneable_heads_and_indices and prune_linear_layer were removed from
both pytorch_utils and modeling_utils in some transformers builds. Provide
minimal inline implementations as final fallback — prune_heads() is never
called at inference time so correctness is only needed for completeness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:30:58 +02:00
Ethanfel 0e417f4078 fix: transformers compat — find_pruneable_heads_and_indices import
Some transformers builds removed these from pytorch_utils. Fall back to
modeling_utils which exposes them in all known versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 16:21:26 +02:00
Ethanfel 6bc3fd6443 chore: vendor selva_core from jnwnlee/selva@d7d40a9
Pure PyTorch SelVA source for SelvaModelLoader/FeatureExtractor/Sampler nodes.
Imports rewritten from selva.* to selva_core.*. mel_converter.py: replaced
librosa.filters.mel with pure-numpy implementation to avoid librosa→numba→NumPy
version incompatibility in some ComfyUI environments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 15:18:09 +02:00