-
40d29bcaf8
feat: add experiment configs for logit+cosine combo and BigVGAN decoder fine-tuning
feature/lora-timestep-sampling
Ethanfel
2026-04-10 16:48:21 +02:00
-
65dc549494
feat: add reference audio comparison metrics to LoRA trainer eval
Ethanfel
2026-04-10 15:04:07 +02:00
-
f745e241c4
chore: sanitize tooltips/comments + add experiment configs
Ethanfel
2026-04-10 13:44:37 +02:00
-
082a2da438
fix: restore dtype after float32 STFT in discriminator spectrogram
Ethanfel
2026-04-10 12:13:55 +02:00
-
c28e090196
fix: cast discriminator inputs to match bfloat16 dtype in BigVGAN FM loss
Ethanfel
2026-04-10 11:36:02 +02:00
-
af6c225f53
feat: add dataset pipeline nodes + latent augmentation for LoRA trainer
Ethanfel
2026-04-10 11:32:00 +02:00
-
30127c13ca
feat: add BigVGAN vocoder sweep scheduler node
Ethanfel
2026-04-10 02:39:56 +02:00
-
4226297735
chore: remove debug VRAM logging
Ethanfel
2026-04-10 01:50:08 +02:00
-
4297715a08
debug: add driver-level VRAM reporting + offload video_enc
Ethanfel
2026-04-10 01:48:04 +02:00
-
9af4bbdd91
fix: force torch.cuda.empty_cache() after pre-generation and CLIP encoding
Ethanfel
2026-04-10 01:42:45 +02:00
-
89d6fccd28
debug: add per-operation VRAM logging in first training step
Ethanfel
2026-04-10 01:35:54 +02:00
-
bd84242fa1
debug: add VRAM logging at offload and training checkpoints
Ethanfel
2026-04-10 01:28:31 +02:00
-
5a2c003fb2
fix: move baseline sample after inference flag stripping
Ethanfel
2026-04-10 01:26:11 +02:00
-
f8d4d77b0d
fix: pre-compute text CLIP embeddings in main thread to avoid inference tensor crash
Ethanfel
2026-04-10 01:19:44 +02:00
-
32e5344ea2
fix: wrap CLIP encoding in inference_mode during pre-generation
Ethanfel
2026-04-10 01:10:58 +02:00
-
10a71b0c4f
fix: offload entire model to CPU in main thread before worker starts
Ethanfel
2026-04-10 00:56:13 +02:00
-
37a27160aa
fix: match mel dtype to vocoder in baseline sample generation
Ethanfel
2026-04-10 00:45:31 +02:00
-
cb9a1eef01
fix: stop loading full feature_utils to GPU before training
Ethanfel
2026-04-10 00:44:38 +02:00
-
d70c611bf7
fix: offload CLIP, synchformer, T5, generator, VAE to CPU before training
Ethanfel
2026-04-10 00:33:07 +02:00
-
4e6cc4d519
feat: cache pre-generated LoRA mels to disk for reuse
Ethanfel
2026-04-10 00:30:20 +02:00
-
0854bd2638
fix: cast discriminators to model dtype to match vocoder output
Ethanfel
2026-04-10 00:25:04 +02:00
-
187b2e3169
fix: cast GAFilter to model dtype after injection
Ethanfel
2026-04-10 00:24:11 +02:00
-
608746ce7b
fix: cast input mel to model dtype before vocoder forward pass
Ethanfel
2026-04-10 00:18:05 +02:00
-
bba5aec7a5
fix: add CFG to LoRA mel pre-generation to match inference conditions
Ethanfel
2026-04-10 00:17:16 +02:00
-
d06936802b
fix: cast mel_converter buffers to float32 to match STFT input dtype
Ethanfel
2026-04-10 00:10:52 +02:00
-
bee518a855
fix: cast all STFT inputs to float32 to prevent cuFFT bfloat16 crash
Ethanfel
2026-04-09 23:53:36 +02:00
-
48b72c0be0
feat: add LoRA mel pre-generation to BigVGAN vocoder trainer
Ethanfel
2026-04-09 23:26:36 +02:00
-
e16480b4c9
feat: add PiSSA/rsLoRA support to scheduler and PiSSA sweep experiment
Ethanfel
2026-04-09 22:07:27 +02:00
-
784fb2753f
feat: PiSSA init, rsLoRA scaling, Spectral Surgery, and training fixes
Ethanfel
2026-04-09 21:54:36 +02:00
-
ecf828b007
fix: move vocoder to correct device after GAFilter injection
Ethanfel
2026-04-09 20:28:55 +02:00
-
793368af18
fix: strip inference flag before unnormalize in LoRA trainer eval
Ethanfel
2026-04-09 20:01:53 +02:00
-
1d1ae61409
fix: move only VAE+vocoder to GPU during eval to prevent device mismatch
Ethanfel
2026-04-09 19:36:02 +02:00
-
8fa2699551
fix: correct DITTO reference latent space mismatch
Ethanfel
2026-04-09 18:57:08 +02:00
-
14fabf01f9
fix: reduce opt_lr step to 0.001 to allow finer lr control in DITTO
Ethanfel
2026-04-09 18:40:21 +02:00
-
445da1e69b
fix: replace std clamp with anchor regularization to prevent OOD noise
Ethanfel
2026-04-09 18:30:05 +02:00
-
fa6c4fa834
fix: clamp x0 std after each optimizer step to prevent OOD noise
Ethanfel
2026-04-09 18:23:39 +02:00
-
286681edff
fix: cast mel to model dtype before VAE encode in DITTO reference loading
Ethanfel
2026-04-09 18:18:41 +02:00
-
056a7b973d
fix: enable VAE encoder in model loader — required for DITTO reference encoding
Ethanfel
2026-04-09 18:15:27 +02:00
-
633fe36fbb
fix: compute DITTO style loss in latent space to eliminate VAE decoder noise
Ethanfel
2026-04-09 18:12:31 +02:00
-
8862089fd0
fix: remove 32-clip cap on DITTO reference loading — use all available clips
Ethanfel
2026-04-09 18:10:10 +02:00
-
608e7df04b
feat: add gram_weight param to DITTO, reduce default style_weight to 0.1
Ethanfel
2026-04-09 18:03:32 +02:00
-
101b1bdb41
fix: _do_optimize returns dict not tuple — prevent double-wrapping AUDIO output
Ethanfel
2026-04-09 17:56:59 +02:00
-
732df151b0
fix: cast ref_mean/ref_gram to model dtype before loss computation
Ethanfel
2026-04-09 17:48:41 +02:00
-
817b75df49
fix: bypass @torch.inference_mode() on decode to preserve gradient chain
Ethanfel
2026-04-09 17:44:35 +02:00
-
1f02d73a3e
fix: remove checkpoint wrapper on decode — direct call preserves grad chain
Ethanfel
2026-04-09 17:40:00 +02:00
-
fb255edaf0
fix: strip inference-mode tensor flags in DITTO before conditions computation
Ethanfel
2026-04-09 17:35:15 +02:00
-
8ccc2438e4
fix: remove FlashSR (audiosr incompatible with Python 3.12), add training loss CSV
Ethanfel
2026-04-09 17:18:34 +02:00
-
8371466e44
fix: guarantee length preservation in _ActivationWithGAFilter
Ethanfel
2026-04-09 16:39:03 +02:00
-
ba0499b77c
fix: FlashSR device handling and remove unused tmp_out
Ethanfel
2026-04-09 16:32:02 +02:00
-
ce62bccc1f
feat: add post-generation audio enhancement nodes
Ethanfel
2026-04-09 16:27:39 +02:00
-
45fced55bc
fix: exclude GAFilter params from L2-SP regularization
Ethanfel
2026-04-09 16:19:52 +02:00
-
db112394e8
feat: add AF-Vocoder GAFilter to BigVGAN trainer and loader
Ethanfel
2026-04-09 16:15:14 +02:00
-
c53ea5517c
feat: add FA-GAN phase-aware STFT loss to BigVGAN trainer
Ethanfel
2026-04-09 16:09:31 +02:00
-
82e449681c
fix: cast mel_converter and wav to float32 before cuFFT in DITTO
Ethanfel
2026-04-09 15:59:55 +02:00
-
15fc5f0793
feat: add SelvaDatasetCompressor node for parallel compression
Ethanfel
2026-04-09 15:36:27 +02:00
-
48493a3f0d
feat: add SelvaDatasetSaver node with NPZ sidecar copy
Ethanfel
2026-04-09 15:27:48 +02:00
-
becb38c27e
fix: use soundfile for WAV/FLAC/OGG to bypass torchcodec/FFmpeg dependency
Ethanfel
2026-04-09 15:16:22 +02:00
-
b9f95cfd7e
fix: detect silent discriminator load failure and fall back explicitly
Ethanfel
2026-04-09 14:39:55 +02:00
-
f50afa9796
fix: guard _estimate_snr against short clips, fix freqs device in _check_hf_shelf
Ethanfel
2026-04-09 14:28:36 +02:00
-
8a85819f97
feat: register audio dataset pipeline nodes in __init__.py
Ethanfel
2026-04-09 14:25:57 +02:00
-
f1c4654bab
feat: add SelvaDatasetItemExtractor node
Ethanfel
2026-04-09 14:24:58 +02:00
-
2d06cb2f52
fix: pass device to hann_window in _check_hf_shelf to avoid GPU mismatch
Ethanfel
2026-04-09 14:22:13 +02:00
-
0731addea9
feat: add SelvaDatasetInspector node (codec artifacts, SNR, clipping)
Ethanfel
2026-04-09 14:20:03 +02:00
-
7eb9bd5745
feat: add SelvaDatasetLUFSNormalizer node (pyloudnorm BS.1770-4)
Ethanfel
2026-04-09 14:17:44 +02:00
-
057bfb813d
feat: add SelvaDatasetResampler node (soxr VHQ)
Ethanfel
2026-04-09 14:13:45 +02:00
-
2c71d4c184
feat: add SelvaDatasetLoader node
Ethanfel
2026-04-09 14:09:43 +02:00
-
d25df10aa5
feat: add audio dataset pipeline skeleton
Ethanfel
2026-04-09 14:05:31 +02:00
-
d70a4d2123
docs: add audio dataset pipeline implementation plan
Ethanfel
2026-04-09 14:02:46 +02:00
-
2b10205657
fix: raise segment_seconds max from 4s to 30s
Ethanfel
2026-04-09 13:49:50 +02:00
-
8166c56552
perf: gradient checkpointing on vocoder forward to reduce activation memory
Ethanfel
2026-04-09 13:45:24 +02:00
-
eece79ccae
fix: correct MRD channel width to 128 and unload models before training
Ethanfel
2026-04-09 13:40:01 +02:00
-
357b875e5e
fix: strip inference tensor flags in DITTO optimizer
Ethanfel
2026-04-09 12:18:20 +02:00
-
211494a91c
fix: DITTO gradient never reached x0, remove unused imports and dead code
Ethanfel
2026-04-09 12:10:02 +02:00
-
1e9551152e
feat: add DITTO optimizer, upgrade BigVGAN trainer, document all nodes
Ethanfel
2026-04-09 12:04:05 +02:00
-
f17f6f0863
feat: save ground truth spectrogram once for direct comparison
Ethanfel
2026-04-09 03:05:47 +02:00
-
304d9d01bf
feat: save mel spectrogram PNG alongside each eval sample
Ethanfel
2026-04-09 03:03:28 +02:00
-
0128a81cc2
fix: use full first clip for eval samples instead of 1s segment
Ethanfel
2026-04-09 03:01:52 +02:00
-
710261f5be
fix: add soundfile fallback for torchaudio.save in sample writing
Ethanfel
2026-04-09 02:58:07 +02:00
-
5df2abd6dd
fix: handle all three inference-tensor sources in vocoder sanitization
Ethanfel
2026-04-09 02:54:41 +02:00
-
b243908873
debug: inspect conv_pre parametrizations and _parameters keys
Ethanfel
2026-04-09 02:46:16 +02:00
-
9df855ee0e
debug: print is_inference() status before failing conv_pre call
Ethanfel
2026-04-09 02:41:51 +02:00
-
78f8aa98ad
fix: clone inference tensors at thread entry to strip the inference flag
Ethanfel
2026-04-09 02:35:48 +02:00
-
e870446b0f
fix: run BigVGAN training in a fresh thread to escape inference_mode
Ethanfel
2026-04-09 02:30:53 +02:00
-
df63b147e9
fix: sanitize all submodule buffers of mel_converter + guarantee target_mel output
Ethanfel
2026-04-09 02:14:12 +02:00
-
51ac099073
fix: sanitize target_flat — clips are inference tensors from outer inference_mode
Ethanfel
2026-04-09 02:09:26 +02:00
-
b7565ec458
fix: sanitize inference tensors in BigVGAN trainer via zeros+copy_ pattern
Ethanfel
2026-04-09 02:05:36 +02:00
-
0fcb6d3106
fix(bigvgan-trainer): replace parameter objects to fully strip inference tensor flag
Ethanfel
2026-04-09 01:58:57 +02:00
-
c86306bde8
fix(bigvgan-trainer): clone vocoder parameters to strip inference tensor flag
Ethanfel
2026-04-09 01:55:16 +02:00
-
f04d59fe63
fix(bigvgan-trainer): clone mel outputs to strip inference tensor flag from buffers
Ethanfel
2026-04-09 01:51:28 +02:00
-
daa36a5f7b
fix(bigvgan-trainer): clone target tensor to exit inference mode before backward
Ethanfel
2026-04-09 01:47:47 +02:00
-
16e20b30ce
fix(bigvgan-trainer): cast audio to model dtype to match bf16 mel_converter buffers
Ethanfel
2026-04-09 01:46:01 +02:00
-
ea7dfed27a
fix(bigvgan-trainer): fallback to soundfile when torchaudio ffmpeg backend fails
Ethanfel
2026-04-09 01:41:59 +02:00
-
81ff0d46c9
fix(bigvgan-trainer): resolve device mismatch in _save_sample after offload
Ethanfel
2026-04-09 01:35:07 +02:00
-
9fdeb65182
feat(bigvgan-trainer): add eval samples at checkpoints and end
Ethanfel
2026-04-09 01:30:34 +02:00
-
790a53e3df
fix(bigvgan): add 44k/BigVGANv2 support to trainer and loader
Ethanfel
2026-04-09 01:28:32 +02:00
-
9c784b4bdb
feat: add BigVGAN vocoder fine-tuner and loader nodes
Ethanfel
2026-04-09 01:26:12 +02:00
-
115a0c3718
feat(steering): conditional-only injection + per-position vectors
Ethanfel
2026-04-09 01:02:51 +02:00
-
95923cdf42
feat: add activation steering pipeline (extractor, loader, sampler injection)
Ethanfel
2026-04-09 00:38:26 +02:00
-
28ee3db337
feat(sampler): add ti_strength blend for TI injection
Ethanfel
2026-04-09 00:07:57 +02:00
-
b89167cfae
fix(ti-trainer): clamp token norm to CLIP manifold to prevent buzz artifacts
Ethanfel
2026-04-08 23:54:23 +02:00