ComfyUI-SelVA

Ethanfel/ComfyUI-SelVA

Fork 0

Commit Graph

Select branches

Hide Pull Requests

deprecated/lora-trainer

deprecated/prismaudio

experiment/crop-to-mask

feature/lora-timestep-sampling

feature/lora-training

main

40d29bcaf8 feat: add experiment configs for logit+cosine combo and BigVGAN decoder fine-tuning feature/lora-timestep-sampling Ethanfel 2026-04-10 16:48:21 +02:00
65dc549494 feat: add reference audio comparison metrics to LoRA trainer eval Ethanfel 2026-04-10 15:04:07 +02:00
f745e241c4 chore: sanitize tooltips/comments + add experiment configs Ethanfel 2026-04-10 13:44:37 +02:00
082a2da438 fix: restore dtype after float32 STFT in discriminator spectrogram Ethanfel 2026-04-10 12:13:55 +02:00
c28e090196 fix: cast discriminator inputs to match bfloat16 dtype in BigVGAN FM loss Ethanfel 2026-04-10 11:36:02 +02:00
af6c225f53 feat: add dataset pipeline nodes + latent augmentation for LoRA trainer Ethanfel 2026-04-10 11:32:00 +02:00
30127c13ca feat: add BigVGAN vocoder sweep scheduler node Ethanfel 2026-04-10 02:39:56 +02:00
4226297735 chore: remove debug VRAM logging Ethanfel 2026-04-10 01:50:08 +02:00
4297715a08 debug: add driver-level VRAM reporting + offload video_enc Ethanfel 2026-04-10 01:48:04 +02:00
9af4bbdd91 fix: force torch.cuda.empty_cache() after pre-generation and CLIP encoding Ethanfel 2026-04-10 01:42:45 +02:00
89d6fccd28 debug: add per-operation VRAM logging in first training step Ethanfel 2026-04-10 01:35:54 +02:00
bd84242fa1 debug: add VRAM logging at offload and training checkpoints Ethanfel 2026-04-10 01:28:31 +02:00
5a2c003fb2 fix: move baseline sample after inference flag stripping Ethanfel 2026-04-10 01:26:11 +02:00
f8d4d77b0d fix: pre-compute text CLIP embeddings in main thread to avoid inference tensor crash Ethanfel 2026-04-10 01:19:44 +02:00
32e5344ea2 fix: wrap CLIP encoding in inference_mode during pre-generation Ethanfel 2026-04-10 01:10:58 +02:00
10a71b0c4f fix: offload entire model to CPU in main thread before worker starts Ethanfel 2026-04-10 00:56:13 +02:00
37a27160aa fix: match mel dtype to vocoder in baseline sample generation Ethanfel 2026-04-10 00:45:31 +02:00
cb9a1eef01 fix: stop loading full feature_utils to GPU before training Ethanfel 2026-04-10 00:44:38 +02:00
d70c611bf7 fix: offload CLIP, synchformer, T5, generator, VAE to CPU before training Ethanfel 2026-04-10 00:33:07 +02:00
4e6cc4d519 feat: cache pre-generated LoRA mels to disk for reuse Ethanfel 2026-04-10 00:30:20 +02:00
0854bd2638 fix: cast discriminators to model dtype to match vocoder output Ethanfel 2026-04-10 00:25:04 +02:00
187b2e3169 fix: cast GAFilter to model dtype after injection Ethanfel 2026-04-10 00:24:11 +02:00
608746ce7b fix: cast input mel to model dtype before vocoder forward pass Ethanfel 2026-04-10 00:18:05 +02:00
bba5aec7a5 fix: add CFG to LoRA mel pre-generation to match inference conditions Ethanfel 2026-04-10 00:17:16 +02:00
d06936802b fix: cast mel_converter buffers to float32 to match STFT input dtype Ethanfel 2026-04-10 00:10:52 +02:00
bee518a855 fix: cast all STFT inputs to float32 to prevent cuFFT bfloat16 crash Ethanfel 2026-04-09 23:53:36 +02:00
48b72c0be0 feat: add LoRA mel pre-generation to BigVGAN vocoder trainer Ethanfel 2026-04-09 23:26:36 +02:00
e16480b4c9 feat: add PiSSA/rsLoRA support to scheduler and PiSSA sweep experiment Ethanfel 2026-04-09 22:07:27 +02:00
784fb2753f feat: PiSSA init, rsLoRA scaling, Spectral Surgery, and training fixes Ethanfel 2026-04-09 21:54:36 +02:00
ecf828b007 fix: move vocoder to correct device after GAFilter injection Ethanfel 2026-04-09 20:28:55 +02:00
793368af18 fix: strip inference flag before unnormalize in LoRA trainer eval Ethanfel 2026-04-09 20:01:53 +02:00
1d1ae61409 fix: move only VAE+vocoder to GPU during eval to prevent device mismatch Ethanfel 2026-04-09 19:36:02 +02:00
8fa2699551 fix: correct DITTO reference latent space mismatch Ethanfel 2026-04-09 18:57:08 +02:00
14fabf01f9 fix: reduce opt_lr step to 0.001 to allow finer lr control in DITTO Ethanfel 2026-04-09 18:40:21 +02:00
445da1e69b fix: replace std clamp with anchor regularization to prevent OOD noise Ethanfel 2026-04-09 18:30:05 +02:00
fa6c4fa834 fix: clamp x0 std after each optimizer step to prevent OOD noise Ethanfel 2026-04-09 18:23:39 +02:00
286681edff fix: cast mel to model dtype before VAE encode in DITTO reference loading Ethanfel 2026-04-09 18:18:41 +02:00
056a7b973d fix: enable VAE encoder in model loader — required for DITTO reference encoding Ethanfel 2026-04-09 18:15:27 +02:00
633fe36fbb fix: compute DITTO style loss in latent space to eliminate VAE decoder noise Ethanfel 2026-04-09 18:12:31 +02:00
8862089fd0 fix: remove 32-clip cap on DITTO reference loading — use all available clips Ethanfel 2026-04-09 18:10:10 +02:00
608e7df04b feat: add gram_weight param to DITTO, reduce default style_weight to 0.1 Ethanfel 2026-04-09 18:03:32 +02:00
101b1bdb41 fix: _do_optimize returns dict not tuple — prevent double-wrapping AUDIO output Ethanfel 2026-04-09 17:56:59 +02:00
732df151b0 fix: cast ref_mean/ref_gram to model dtype before loss computation Ethanfel 2026-04-09 17:48:41 +02:00
817b75df49 fix: bypass @torch.inference_mode() on decode to preserve gradient chain Ethanfel 2026-04-09 17:44:35 +02:00
1f02d73a3e fix: remove checkpoint wrapper on decode — direct call preserves grad chain Ethanfel 2026-04-09 17:40:00 +02:00
fb255edaf0 fix: strip inference-mode tensor flags in DITTO before conditions computation Ethanfel 2026-04-09 17:35:15 +02:00
8ccc2438e4 fix: remove FlashSR (audiosr incompatible with Python 3.12), add training loss CSV Ethanfel 2026-04-09 17:18:34 +02:00
8371466e44 fix: guarantee length preservation in _ActivationWithGAFilter Ethanfel 2026-04-09 16:39:03 +02:00
ba0499b77c fix: FlashSR device handling and remove unused tmp_out Ethanfel 2026-04-09 16:32:02 +02:00
ce62bccc1f feat: add post-generation audio enhancement nodes Ethanfel 2026-04-09 16:27:39 +02:00
45fced55bc fix: exclude GAFilter params from L2-SP regularization Ethanfel 2026-04-09 16:19:52 +02:00
db112394e8 feat: add AF-Vocoder GAFilter to BigVGAN trainer and loader Ethanfel 2026-04-09 16:15:14 +02:00
c53ea5517c feat: add FA-GAN phase-aware STFT loss to BigVGAN trainer Ethanfel 2026-04-09 16:09:31 +02:00
82e449681c fix: cast mel_converter and wav to float32 before cuFFT in DITTO Ethanfel 2026-04-09 15:59:55 +02:00
15fc5f0793 feat: add SelvaDatasetCompressor node for parallel compression Ethanfel 2026-04-09 15:36:27 +02:00
48493a3f0d feat: add SelvaDatasetSaver node with NPZ sidecar copy Ethanfel 2026-04-09 15:27:48 +02:00
becb38c27e fix: use soundfile for WAV/FLAC/OGG to bypass torchcodec/FFmpeg dependency Ethanfel 2026-04-09 15:16:22 +02:00
b9f95cfd7e fix: detect silent discriminator load failure and fall back explicitly Ethanfel 2026-04-09 14:39:55 +02:00
f50afa9796 fix: guard _estimate_snr against short clips, fix freqs device in _check_hf_shelf Ethanfel 2026-04-09 14:28:36 +02:00
8a85819f97 feat: register audio dataset pipeline nodes in __init__.py Ethanfel 2026-04-09 14:25:57 +02:00
f1c4654bab feat: add SelvaDatasetItemExtractor node Ethanfel 2026-04-09 14:24:58 +02:00
2d06cb2f52 fix: pass device to hann_window in _check_hf_shelf to avoid GPU mismatch Ethanfel 2026-04-09 14:22:13 +02:00
0731addea9 feat: add SelvaDatasetInspector node (codec artifacts, SNR, clipping) Ethanfel 2026-04-09 14:20:03 +02:00
7eb9bd5745 feat: add SelvaDatasetLUFSNormalizer node (pyloudnorm BS.1770-4) Ethanfel 2026-04-09 14:17:44 +02:00
057bfb813d feat: add SelvaDatasetResampler node (soxr VHQ) Ethanfel 2026-04-09 14:13:45 +02:00
2c71d4c184 feat: add SelvaDatasetLoader node Ethanfel 2026-04-09 14:09:43 +02:00
d25df10aa5 feat: add audio dataset pipeline skeleton Ethanfel 2026-04-09 14:05:31 +02:00
d70a4d2123 docs: add audio dataset pipeline implementation plan Ethanfel 2026-04-09 14:02:46 +02:00
2b10205657 fix: raise segment_seconds max from 4s to 30s Ethanfel 2026-04-09 13:49:50 +02:00
8166c56552 perf: gradient checkpointing on vocoder forward to reduce activation memory Ethanfel 2026-04-09 13:45:24 +02:00
eece79ccae fix: correct MRD channel width to 128 and unload models before training Ethanfel 2026-04-09 13:40:01 +02:00
357b875e5e fix: strip inference tensor flags in DITTO optimizer Ethanfel 2026-04-09 12:18:20 +02:00
211494a91c fix: DITTO gradient never reached x0, remove unused imports and dead code Ethanfel 2026-04-09 12:10:02 +02:00
1e9551152e feat: add DITTO optimizer, upgrade BigVGAN trainer, document all nodes Ethanfel 2026-04-09 12:04:05 +02:00
f17f6f0863 feat: save ground truth spectrogram once for direct comparison Ethanfel 2026-04-09 03:05:47 +02:00
304d9d01bf feat: save mel spectrogram PNG alongside each eval sample Ethanfel 2026-04-09 03:03:28 +02:00
0128a81cc2 fix: use full first clip for eval samples instead of 1s segment Ethanfel 2026-04-09 03:01:52 +02:00
710261f5be fix: add soundfile fallback for torchaudio.save in sample writing Ethanfel 2026-04-09 02:58:07 +02:00
5df2abd6dd fix: handle all three inference-tensor sources in vocoder sanitization Ethanfel 2026-04-09 02:54:41 +02:00
b243908873 debug: inspect conv_pre parametrizations and _parameters keys Ethanfel 2026-04-09 02:46:16 +02:00
9df855ee0e debug: print is_inference() status before failing conv_pre call Ethanfel 2026-04-09 02:41:51 +02:00
78f8aa98ad fix: clone inference tensors at thread entry to strip the inference flag Ethanfel 2026-04-09 02:35:48 +02:00
e870446b0f fix: run BigVGAN training in a fresh thread to escape inference_mode Ethanfel 2026-04-09 02:30:53 +02:00
df63b147e9 fix: sanitize all submodule buffers of mel_converter + guarantee target_mel output Ethanfel 2026-04-09 02:14:12 +02:00
51ac099073 fix: sanitize target_flat — clips are inference tensors from outer inference_mode Ethanfel 2026-04-09 02:09:26 +02:00
b7565ec458 fix: sanitize inference tensors in BigVGAN trainer via zeros+copy_ pattern Ethanfel 2026-04-09 02:05:36 +02:00
0fcb6d3106 fix(bigvgan-trainer): replace parameter objects to fully strip inference tensor flag Ethanfel 2026-04-09 01:58:57 +02:00
c86306bde8 fix(bigvgan-trainer): clone vocoder parameters to strip inference tensor flag Ethanfel 2026-04-09 01:55:16 +02:00
f04d59fe63 fix(bigvgan-trainer): clone mel outputs to strip inference tensor flag from buffers Ethanfel 2026-04-09 01:51:28 +02:00
daa36a5f7b fix(bigvgan-trainer): clone target tensor to exit inference mode before backward Ethanfel 2026-04-09 01:47:47 +02:00
16e20b30ce fix(bigvgan-trainer): cast audio to model dtype to match bf16 mel_converter buffers Ethanfel 2026-04-09 01:46:01 +02:00
ea7dfed27a fix(bigvgan-trainer): fallback to soundfile when torchaudio ffmpeg backend fails Ethanfel 2026-04-09 01:41:59 +02:00
81ff0d46c9 fix(bigvgan-trainer): resolve device mismatch in _save_sample after offload Ethanfel 2026-04-09 01:35:07 +02:00
9fdeb65182 feat(bigvgan-trainer): add eval samples at checkpoints and end Ethanfel 2026-04-09 01:30:34 +02:00
790a53e3df fix(bigvgan): add 44k/BigVGANv2 support to trainer and loader Ethanfel 2026-04-09 01:28:32 +02:00
9c784b4bdb feat: add BigVGAN vocoder fine-tuner and loader nodes Ethanfel 2026-04-09 01:26:12 +02:00
115a0c3718 feat(steering): conditional-only injection + per-position vectors Ethanfel 2026-04-09 01:02:51 +02:00
95923cdf42 feat: add activation steering pipeline (extractor, loader, sampler injection) Ethanfel 2026-04-09 00:38:26 +02:00
28ee3db337 feat(sampler): add ti_strength blend for TI injection Ethanfel 2026-04-09 00:07:57 +02:00
b89167cfae fix(ti-trainer): clamp token norm to CLIP manifold to prevent buzz artifacts Ethanfel 2026-04-08 23:54:23 +02:00

1 2 3

Commit Graph Select branches Hide Pull Requests deprecated/lora-trainer deprecated/prismaudio experiment/crop-to-mask feature/lora-timestep-sampling feature/lora-training main Mono Color

Commit Graph

Select branches

Hide Pull Requests

deprecated/lora-trainer

deprecated/prismaudio

experiment/crop-to-mask

feature/lora-timestep-sampling

feature/lora-training

main