ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	2f4641247a	feat: add resume support to train_lora.py Step checkpoints now save optimizer state, scheduler state, and step number alongside the LoRA weights. Pass --resume path/to/adapter_stepXXXXX.pt to continue training from that checkpoint. --steps always means total steps, so resuming from 1000 with --steps 2000 trains 1000 more steps. adapter_final.pt format is unchanged (state_dict + meta only) so SelvaLoraLoader remains compatible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:59:30 +02:00
Ethanfel	c88e27742c	fix: sanitize name field and remove double load_npz call - _resolve_named_path: replace / \ and null in name to prevent path traversal outside cache_dir (would cause a confusing FileNotFoundError at np.savez time instead of at path resolution). - train_lora: load_npz was called twice per clip when prompt was in prompts.txt; consolidate to a single call before prompt resolution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 15:30:25 +02:00
Ethanfel	1eb82d8050	refactor: train_lora accepts .npz + audio pairs instead of raw video - Input is now pre-extracted .npz files (from SelvaFeatureExtractor) paired with clean audio files (same stem). Visual features no longer re-extracted during training. - FeaturesUtils loaded with enable_conditions=False (VAE only) — Synchformer and T5 are no longer loaded, saving ~3-4 GB VRAM. - CLIP text encoder loaded separately via patch_clip so text prompt can differ from the one used during feature extraction. - Prompt priority: prompts.txt override > embedded in .npz > directory name. - Removed: torchvision video loading, frame sampling/resizing, net_video_enc, synchformer path check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 15:14:26 +02:00
Ethanfel	cde280049b	fix: correct LoRALinear dtype and remove unused import - LoRALinear now creates lora_A/lora_B with dtype matching the base linear's weight, preventing a float32/bf16 mismatch at forward time when the generator is loaded in bf16 or fp16. - Remove unused `import math` from train_lora.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:57:09 +02:00
Ethanfel	437c62b28f	feat: LoRA fine-tuning for SelVA generator Teaches the model new/partial sound classes from custom video+audio pairs. Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model. selva_core/model/lora.py LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices. B initialised to zero → zero adapter contribution at init. apply_lora(): walks named_modules, replaces matching nn.Linear in-place. Default target: "attn.qkv" (all 21 SelfAttention QKV projections in large_44k). Add "linear1" to also wrap post-attention output projections. get_lora_state_dict() / load_lora() for ~10 MB save/load. train_lora.py (standalone script, no ComfyUI dependency) Data format: directory of video files + optional prompts.txt ("filename: description"). Falls back to directory name as prompt. Pre-extracts features for all clips into RAM, then trains from those. Training loop: encode audio→latent (need_vae_encoder=True), flow matching MSE loss on velocity prediction, backward on LoRA params only. Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata. Key verified interfaces used: encode_audio() → DiagonalGaussianDistribution; .mode().clone() required normalize() is in-place forward(latent, clip_f, sync_f, text_f, t) takes raw tensors nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node) Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights. strength param scales lora_B to adjust adapter contribution at inference. Reads rank/alpha/target from embedded metadata if present. Returns a patched SELVA_MODEL bundle for use with the existing Sampler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:38:46 +02:00

5 Commits