ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	58e1985af2	feat: SelVA Skip Experiment node + save partial scalars on skip - New node: SelVA Skip Experiment — writes skip_current.flag from UI, queue in a second workflow tab while scheduler is running - SkipExperiment now attaches partial loss/grad/spectral data to the exception so the scheduler saves all collected scalars in the summary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:10:43 +02:00
Ethanfel	675644189d	feat: add SelVA Dataset Browser node Companion node for inspecting dataset.json entries by integer index. Outputs video (.mp4), audio (.wav/.flac), features (.npz), frames dir, mask dir, label, and max_index for constraining the index widget range. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 14:55:27 +02:00
Ethanfel	3ec380a27e	feat: add SelVA LoRA Scheduler node for automated experiment sweeps - Extract _prepare_dataset() from SelvaLoraTrainer.train() as a module-level function so the dataset can be encoded once and reused across experiments - Change _train_inner() return value from tuple to dict (adds loss_history, meta, completed; train() unpacks for ComfyUI — no change to node outputs) - New SelvaLoraScheduler node: reads a JSON sweep file, runs N experiments sequentially, writes experiment_summary.json (updated after each run) and loss_comparison.png with all smoothed curves overlaid on the same axes - Register SelvaLoraScheduler in nodes/__init__.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:03:21 +02:00
Ethanfel	f206a1b38c	feat: add SelVA LoRA Trainer ComfyUI node Runs the full training loop inside ComfyUI. Reuses the already-loaded CLIP model from the inference model for text encoding; loads only a minimal VAE encoder separately (freed after dataset pre-loading). Outputs: - SELVA_MODEL with LoRA applied (ready to connect directly to Sampler) - adapter_path STRING (for SelVA LoRA Loader in future sessions) - loss_curve IMAGE (PIL-rendered line chart of training loss per 50 steps) Progress is shown via ComfyUI ProgressBar (two phases: dataset loading, then training steps). Resume is supported via resume_path input. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:07:38 +02:00
Ethanfel	437c62b28f	feat: LoRA fine-tuning for SelVA generator Teaches the model new/partial sound classes from custom video+audio pairs. Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model. selva_core/model/lora.py LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices. B initialised to zero → zero adapter contribution at init. apply_lora(): walks named_modules, replaces matching nn.Linear in-place. Default target: "attn.qkv" (all 21 SelfAttention QKV projections in large_44k). Add "linear1" to also wrap post-attention output projections. get_lora_state_dict() / load_lora() for ~10 MB save/load. train_lora.py (standalone script, no ComfyUI dependency) Data format: directory of video files + optional prompts.txt ("filename: description"). Falls back to directory name as prompt. Pre-extracts features for all clips into RAM, then trains from those. Training loop: encode audio→latent (need_vae_encoder=True), flow matching MSE loss on velocity prediction, backward on LoRA params only. Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata. Key verified interfaces used: encode_audio() → DiagonalGaussianDistribution; .mode().clone() required normalize() is in-place forward(latent, clip_f, sync_f, text_f, t) takes raw tensors nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node) Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights. strength param scales lora_B to adjust adapter contribution at inference. Reads rank/alpha/target from embedded metadata if present. Returns a patched SELVA_MODEL bundle for use with the existing Sampler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:38:46 +02:00
Ethanfel	ff26d0b87d	fix: bug sweep and improvements - nodes/__init__.py: fix [PrismAudio] leftover label in error print - selva_feature_extractor: hash beginning, middle and end of video tensor instead of just first 1MB, avoiding collisions on videos with same opening frames - selva_sampler: derive SequenceConfig from model template via dataclasses.replace instead of hardcoding sampling_rate/spectrogram_frame_rate per mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 18:04:35 +02:00
Ethanfel	982d66e078	chore: remove PrismAudio nodes from selva-integration branch This branch registers only the three SelVA nodes. PrismAudio nodes stay on master/feature/lora-trainer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 17:01:21 +02:00
Ethanfel	fe94438356	feat: SelvaModelLoader node — loads TextSynch + MMAudio + FeaturesUtils Resolves weights from models/selva/. Reuses synchformer_state_dict.pth from models/prismaudio/ (no duplicate download). Supports four variants: small_16k / small_44k / medium_44k / large_44k. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:21:03 +02:00
Ethanfel	baa80de194	feat: project scaffolding with shared utils and node registration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 16:59:21 +01:00

9 Commits