ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	8fade1b0e3	fix: initialize LoRA params on same device as wrapped linear apply_lora() is called after generator.to(device), so lora_A/lora_B were being created on CPU while the rest of the model was on CUDA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:17:29 +02:00
Ethanfel	cde280049b	fix: correct LoRALinear dtype and remove unused import - LoRALinear now creates lora_A/lora_B with dtype matching the base linear's weight, preventing a float32/bf16 mismatch at forward time when the generator is loaded in bf16 or fp16. - Remove unused `import math` from train_lora.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:57:09 +02:00
Ethanfel	437c62b28f	feat: LoRA fine-tuning for SelVA generator Teaches the model new/partial sound classes from custom video+audio pairs. Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model. selva_core/model/lora.py LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices. B initialised to zero → zero adapter contribution at init. apply_lora(): walks named_modules, replaces matching nn.Linear in-place. Default target: "attn.qkv" (all 21 SelfAttention QKV projections in large_44k). Add "linear1" to also wrap post-attention output projections. get_lora_state_dict() / load_lora() for ~10 MB save/load. train_lora.py (standalone script, no ComfyUI dependency) Data format: directory of video files + optional prompts.txt ("filename: description"). Falls back to directory name as prompt. Pre-extracts features for all clips into RAM, then trains from those. Training loop: encode audio→latent (need_vae_encoder=True), flow matching MSE loss on velocity prediction, backward on LoRA params only. Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata. Key verified interfaces used: encode_audio() → DiagonalGaussianDistribution; .mode().clone() required normalize() is in-place forward(latent, clip_f, sync_f, text_f, t) takes raw tensors nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node) Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights. strength param scales lora_B to adjust adapter contribution at inference. Reads rank/alpha/target from embedded metadata if present. Returns a patched SELVA_MODEL bundle for use with the existing Sampler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:38:46 +02:00
Ethanfel	b4124f58b3	fix: BigVGANv2._from_pretrained() compat with newer huggingface_hub Newer hf_hub stopped passing proxies/resume_download/local_files_only/token to _from_pretrained(). Give them defaults so the call doesn't fail when these kwargs are omitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:51:48 +02:00
Ethanfel	614a2e02aa	fix: weights_only=False for SelVA checkpoints (PyTorch 2.6 compat) PyTorch 2.6 changed the default to weights_only=True. SelVA checkpoints contain non-tensor types (numpy scalars etc.) that fail strict unpickling. All weights come from trusted sources (jnwnlee/selva HF repo). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:38:31 +02:00
Ethanfel	4da4858e4a	fix: inline prune helpers when removed from both transformers locations find_pruneable_heads_and_indices and prune_linear_layer were removed from both pytorch_utils and modeling_utils in some transformers builds. Provide minimal inline implementations as final fallback — prune_heads() is never called at inference time so correctness is only needed for completeness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:30:58 +02:00
Ethanfel	0e417f4078	fix: transformers compat — find_pruneable_heads_and_indices import Some transformers builds removed these from pytorch_utils. Fall back to modeling_utils which exposes them in all known versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:21:26 +02:00
Ethanfel	6bc3fd6443	chore: vendor selva_core from jnwnlee/selva@d7d40a9 Pure PyTorch SelVA source for SelvaModelLoader/FeatureExtractor/Sampler nodes. Imports rewritten from selva.* to selva_core.*. mel_converter.py: replaced librosa.filters.mel with pure-numpy implementation to avoid librosa→numba→NumPy version incompatibility in some ComfyUI environments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:18:09 +02:00

8 Commits