Commit Graph

5 Commits

Author SHA1 Message Date
Ethanfel 437c62b28f feat: LoRA fine-tuning for SelVA generator
Teaches the model new/partial sound classes from custom video+audio pairs.
Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model.

selva_core/model/lora.py
  LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices.
  B initialised to zero → zero adapter contribution at init.
  apply_lora(): walks named_modules, replaces matching nn.Linear in-place.
  Default target: "attn.qkv" (all 21 SelfAttention QKV projections in
  large_44k). Add "linear1" to also wrap post-attention output projections.
  get_lora_state_dict() / load_lora() for ~10 MB save/load.

train_lora.py (standalone script, no ComfyUI dependency)
  Data format: directory of video files + optional prompts.txt
  ("filename: description"). Falls back to directory name as prompt.
  Pre-extracts features for all clips into RAM, then trains from those.
  Training loop: encode audio→latent (need_vae_encoder=True), flow
  matching MSE loss on velocity prediction, backward on LoRA params only.
  Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata.
  Key verified interfaces used:
    encode_audio() → DiagonalGaussianDistribution; .mode().clone() required
    normalize() is in-place
    forward(latent, clip_f, sync_f, text_f, t) takes raw tensors

nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node)
  Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights.
  strength param scales lora_B to adjust adapter contribution at inference.
  Reads rank/alpha/target from embedded metadata if present.
  Returns a patched SELVA_MODEL bundle for use with the existing Sampler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 14:38:46 +02:00
Ethanfel ff26d0b87d fix: bug sweep and improvements
- nodes/__init__.py: fix [PrismAudio] leftover label in error print
- selva_feature_extractor: hash beginning, middle and end of video tensor
  instead of just first 1MB, avoiding collisions on videos with same opening frames
- selva_sampler: derive SequenceConfig from model template via dataclasses.replace
  instead of hardcoding sampling_rate/spectrogram_frame_rate per mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 18:04:35 +02:00
Ethanfel 982d66e078 chore: remove PrismAudio nodes from selva-integration branch
This branch registers only the three SelVA nodes. PrismAudio nodes stay
on master/feature/lora-trainer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 17:01:21 +02:00
Ethanfel fe94438356 feat: SelvaModelLoader node — loads TextSynch + MMAudio + FeaturesUtils
Resolves weights from models/selva/. Reuses synchformer_state_dict.pth from
models/prismaudio/ (no duplicate download). Supports four variants:
small_16k / small_44k / medium_44k / large_44k.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 15:21:03 +02:00
Ethanfel baa80de194 feat: project scaffolding with shared utils and node registration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 16:59:21 +01:00