ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	ce62bccc1f	feat: add post-generation audio enhancement nodes Three new nodes for post-generation quality improvement: - SelvaHarmonicExciter: multi-band exciter (HPF → tanh saturation → mix) restores harmonic richness lost in BigVGAN HF reconstruction - SelvaFlashSR: audio super-resolution via FlashSR basic model (haoheliu/versatile_audio_super_resolution, requires pip install audiosr) predicts missing HF content above vocoder reconstruction ceiling - SelvaOutputNormalizer: BS.1770-4 LUFS normalization + true peak limiting for consistent loudness on generated outputs (pyloudnorm) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:27:39 +02:00
Ethanfel	15fc5f0793	feat: add SelvaDatasetCompressor node for parallel compression Mild 2:1-3:1 parallel compression via pedalboard.Compressor to reduce within-clip loudness variance after LUFS normalization. Blend ratio keeps transients intact while tightening dynamics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 15:36:27 +02:00
Ethanfel	48493a3f0d	feat: add SelvaDatasetSaver node with NPZ sidecar copy Saves all clips in an AUDIO_DATASET to FLAC. When npz_source_dir is provided, copies the matching .npz for each clip so FLAC/NPZ pairs stay in sync after the inspector filters out bad clips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 15:27:48 +02:00
Ethanfel	8a85819f97	feat: register audio dataset pipeline nodes in __init__.py	2026-04-09 14:25:57 +02:00
Ethanfel	1e9551152e	feat: add DITTO optimizer, upgrade BigVGAN trainer, document all nodes BigVGAN trainer (selva_bigvgan_trainer.py): - Add snake_alpha_only train mode: tunes only ~27K per-channel α params (0.024% of 112M) — physically cannot cause harmonic smearing - Add lambda_l2sp: L2-SP anchor regularization toward pretrained weights - Add optional discriminator_path: frozen MPD+MRD feature matching loss replaces mel L1 when a BigVGAN discriminator checkpoint is provided - Inline MPD + MRD discriminator implementations (no extra dependencies) DITTO optimizer (selva_ditto_optimizer.py): - New node: inference-time noise optimization (arXiv:2401.12179) - Optimizes x₀ via mel Gram matrix style loss against BJ reference clips - All model weights frozen — zero quality degradation risk - Truncated BPTT through last n_grad_steps of the ODE (configurable) - Gradient checkpointing on each differentiated step Docs: - README: document all 20 nodes (was 3), add workflow diagrams - STYLE_TRANSFER.md: new guide — DITTO, vocoder fine-tuning tiers, why LoRA/TI fail, combined approach, dataset prep Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 12:04:05 +02:00
Ethanfel	9c784b4bdb	feat: add BigVGAN vocoder fine-tuner and loader nodes Spectral-loss-only fine-tuning of the BigVGAN vocoder (mel→waveform) on BJ audio clips. DiT and VAE are completely frozen. Losses: mel L1 reconstruction + multi-resolution STFT magnitude L1 (same three resolutions as the BigVGAN discriminator config). Saves in {'generator': state_dict} format compatible with the original BigVGAN checkpoint. Loader replaces vocoder weights in the loaded SELVA_MODEL in-place so no full model reload is needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 01:26:12 +02:00
Ethanfel	95923cdf42	feat: add activation steering pipeline (extractor, loader, sampler injection) Implements per-block DiT activation steering as an alternative to textual inversion. Extractor runs frozen generator on dataset with BJ vs empty conditions, records mean hidden-state delta per block, saves [hidden_dim] vectors (seq-averaged so they broadcast to any inference duration). Loader reads the bundle. Sampler registers forward hooks during the ODE that add strength × vec to each block output, cleaned up in a finally block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 00:38:26 +02:00
Ethanfel	e37bfe1b1c	feat: add SelVA TI Scheduler for sweep-based textual inversion experiments - SelvaTiScheduler: runs a JSON-defined sweep of TI training experiments, loading the dataset once and reusing it across runs - Collects per-experiment loss history, final/min loss, stability metric (loss_std_last_quarter), and duration — written to experiment_summary.json after each completed run so partial sweeps survive interruption - Resume-aware: skips experiments already marked completed in an existing summary file - Outputs smoothed loss comparison chart (same axes, one curve per experiment) - SelvaTextualInversionTrainer._train_inner now returns a dict {embeddings_path, loss_history} so the scheduler can read results; train() extracts just the path for ComfyUI JSON format: name, description, data_dir, output_root, base config, experiments list with id + param overrides Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 23:13:04 +02:00
Ethanfel	e56ece9c1c	feat: add SelVA Textual Inversion Trainer and Loader nodes Learns K CLIP token embeddings ([K, 1024]) with all model weights frozen, keeping generated latents on the decoder's natural manifold — avoids the quality degradation that affects LoRA on BJ's audio dataset. - selva_textual_inversion_trainer.py: trains learned_tokens via AdamW, injects into last K positions of 77-token CLIP embedding, checkpoints with eval audio + spectral metrics - selva_textual_inversion_loader.py: loads .pt bundle, returns TEXTUAL_INVERSION dict for sampler - selva_sampler.py: optional textual_inversion input; injects into both text_clip and neg_text_clip before preprocess_conditions - __init__.py: registers both new nodes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 23:01:44 +02:00
Ethanfel	eed7eefeac	feat: add SelVA HF Smoother and Spectral Matcher preprocessing nodes Two ComfyUI nodes to reduce domain mismatch between custom training audio and the MMAudio VAE's expected spectral distribution: SelvaHfSmoother: blends a low-pass filtered copy (biquad) with the original at a configurable cutoff and blend ratio. Attenuates extreme HF content that BigVGANv2 handles poorly. RMS-preserving. SelvaSpectralMatcher: computes the log-mel energy profile of the clip, compares it per-band to the VAE's normalization means (DATA_MEAN_80D/128D), and applies a smooth STFT-domain gain correction to match the codec's training distribution. Configurable strength and max_gain_db clamp. RMS-preserving. Recommended workflow: SpectralMatcher → HfSmoother → feature extraction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 20:28:16 +02:00
Ethanfel	8195c3114a	feat: add SelVA VAE Roundtrip node Encodes audio through the VAE then decodes straight back, bypassing the diffusion model entirely. Use this to isolate whether saturation artifacts are introduced by the codec reconstruction (VAE/DAC) or by the LoRA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 19:15:20 +02:00
Ethanfel	d2e1ea7b80	feat: add SelVA LoRA Evaluator node Generates audio samples from a list of adapters against a fixed reference clip, collects spectral metrics for each, and outputs a comparison bar chart + eval_summary.json. Useful for comparing sweep candidates before committing to a next round of training. JSON format: name, data_dir, output_dir, steps, seed, adapters[{id, path}]. Empty path = baseline (no LoRA). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 17:26:50 +02:00
Ethanfel	58e1985af2	feat: SelVA Skip Experiment node + save partial scalars on skip - New node: SelVA Skip Experiment — writes skip_current.flag from UI, queue in a second workflow tab while scheduler is running - SkipExperiment now attaches partial loss/grad/spectral data to the exception so the scheduler saves all collected scalars in the summary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:10:43 +02:00
Ethanfel	675644189d	feat: add SelVA Dataset Browser node Companion node for inspecting dataset.json entries by integer index. Outputs video (.mp4), audio (.wav/.flac), features (.npz), frames dir, mask dir, label, and max_index for constraining the index widget range. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 14:55:27 +02:00
Ethanfel	3ec380a27e	feat: add SelVA LoRA Scheduler node for automated experiment sweeps - Extract _prepare_dataset() from SelvaLoraTrainer.train() as a module-level function so the dataset can be encoded once and reused across experiments - Change _train_inner() return value from tuple to dict (adds loss_history, meta, completed; train() unpacks for ComfyUI — no change to node outputs) - New SelvaLoraScheduler node: reads a JSON sweep file, runs N experiments sequentially, writes experiment_summary.json (updated after each run) and loss_comparison.png with all smoothed curves overlaid on the same axes - Register SelvaLoraScheduler in nodes/__init__.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:03:21 +02:00
Ethanfel	f206a1b38c	feat: add SelVA LoRA Trainer ComfyUI node Runs the full training loop inside ComfyUI. Reuses the already-loaded CLIP model from the inference model for text encoding; loads only a minimal VAE encoder separately (freed after dataset pre-loading). Outputs: - SELVA_MODEL with LoRA applied (ready to connect directly to Sampler) - adapter_path STRING (for SelVA LoRA Loader in future sessions) - loss_curve IMAGE (PIL-rendered line chart of training loss per 50 steps) Progress is shown via ComfyUI ProgressBar (two phases: dataset loading, then training steps). Resume is supported via resume_path input. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:07:38 +02:00
Ethanfel	437c62b28f	feat: LoRA fine-tuning for SelVA generator Teaches the model new/partial sound classes from custom video+audio pairs. Only ~10 MB of adapter weights are trained vs ~4.4 GB for the full model. selva_core/model/lora.py LoRALinear: wraps nn.Linear with frozen base + trainable A/B matrices. B initialised to zero → zero adapter contribution at init. apply_lora(): walks named_modules, replaces matching nn.Linear in-place. Default target: "attn.qkv" (all 21 SelfAttention QKV projections in large_44k). Add "linear1" to also wrap post-attention output projections. get_lora_state_dict() / load_lora() for ~10 MB save/load. train_lora.py (standalone script, no ComfyUI dependency) Data format: directory of video files + optional prompts.txt ("filename: description"). Falls back to directory name as prompt. Pre-extracts features for all clips into RAM, then trains from those. Training loop: encode audio→latent (need_vae_encoder=True), flow matching MSE loss on velocity prediction, backward on LoRA params only. Saves adapter_stepNNNNN.pt checkpoints + adapter_final.pt with metadata. Key verified interfaces used: encode_audio() → DiagonalGaussianDistribution; .mode().clone() required normalize() is in-place forward(latent, clip_f, sync_f, text_f, t) takes raw tensors nodes/selva_lora_loader.py (SelVA LoRA Loader ComfyUI node) Loads .pt adapter, deep-copies the generator, applies LoRA, loads weights. strength param scales lora_B to adjust adapter contribution at inference. Reads rank/alpha/target from embedded metadata if present. Returns a patched SELVA_MODEL bundle for use with the existing Sampler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 14:38:46 +02:00
Ethanfel	ff26d0b87d	fix: bug sweep and improvements - nodes/__init__.py: fix [PrismAudio] leftover label in error print - selva_feature_extractor: hash beginning, middle and end of video tensor instead of just first 1MB, avoiding collisions on videos with same opening frames - selva_sampler: derive SequenceConfig from model template via dataclasses.replace instead of hardcoding sampling_rate/spectrogram_frame_rate per mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 18:04:35 +02:00
Ethanfel	982d66e078	chore: remove PrismAudio nodes from selva-integration branch This branch registers only the three SelVA nodes. PrismAudio nodes stay on master/feature/lora-trainer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 17:01:21 +02:00
Ethanfel	fe94438356	feat: SelvaModelLoader node — loads TextSynch + MMAudio + FeaturesUtils Resolves weights from models/selva/. Reuses synchformer_state_dict.pth from models/prismaudio/ (no duplicate download). Supports four variants: small_16k / small_44k / medium_44k / large_44k. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:21:03 +02:00
Ethanfel	baa80de194	feat: project scaffolding with shared utils and node registration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 16:59:21 +01:00

21 Commits