LoRA quality improvements addressing intruder dimension problem:
1. PiSSA initialization (arXiv:2404.02948): init A,B from top-r SVD of
pretrained weight. Starts on-manifold, eliminates intruder dimensions
at init. Base weight stores residual W_res = W - B@A*scale.
2. rsLoRA scaling (arXiv:2312.03732): alpha/sqrt(rank) instead of
alpha/rank. Prevents gradient collapse at high ranks (128+).
3. Post-training Spectral Surgery (arXiv:2603.03995): SVD of trained
LoRA update, gradient-sensitivity reweighting to suppress remaining
intruder dimensions. Runs automatically after training completes.
4. alpha default changed to 2*rank (was 1*rank). Produces fewer intruder
dimensions per arXiv:2410.21228.
5. weight_decay reduced from 1e-2 to 0.0 (standard for LoRA, prevents
erasing learned style weights).
6. random.choices replaced with random.sample when batch_size <= dataset
size (eliminates duplicate samples per batch).
PiSSA checkpoints include base weights (residual). Loader/evaluator
updated to handle both standard and PiSSA checkpoint formats.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Teaches the model new/partial sound classes from custom video+audio pairs.
Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model.
selva_core/model/lora.py
LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices.
B initialised to zero → zero adapter contribution at init.
apply_lora(): walks named_modules, replaces matching nn.Linear in-place.
Default target: "attn.qkv" (all 21 SelfAttention QKV projections in
large_44k). Add "linear1" to also wrap post-attention output projections.
get_lora_state_dict() / load_lora() for ~10 MB save/load.
train_lora.py (standalone script, no ComfyUI dependency)
Data format: directory of video files + optional prompts.txt
("filename: description"). Falls back to directory name as prompt.
Pre-extracts features for all clips into RAM, then trains from those.
Training loop: encode audio→latent (need_vae_encoder=True), flow
matching MSE loss on velocity prediction, backward on LoRA params only.
Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata.
Key verified interfaces used:
encode_audio() → DiagonalGaussianDistribution; .mode().clone() required
normalize() is in-place
forward(latent, clip_f, sync_f, text_f, t) takes raw tensors
nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node)
Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights.
strength param scales lora_B to adjust adapter contribution at inference.
Reads rank/alpha/target from embedded metadata if present.
Returns a patched SELVA_MODEL bundle for use with the existing Sampler.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>