ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	082a2da438	fix: restore dtype after float32 STFT in discriminator spectrogram torch.stft requires float32 input, but the .float() cast was not reversed before the spectrogram hit bfloat16 Conv2d weights. Save the original dtype and cast back after abs(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 12:13:55 +02:00
Ethanfel	c28e090196	fix: cast discriminator inputs to match bfloat16 dtype in BigVGAN FM loss The frozen discriminators are loaded in model dtype (bfloat16) but vocoder waveform outputs are float32, causing a Conv2d dtype mismatch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 11:36:02 +02:00
Ethanfel	af6c225f53	feat: add dataset pipeline nodes + latent augmentation for LoRA trainer New dataset pipeline nodes: - SelvaDatasetSpectralMatcher: batch spectral EQ toward VAE distribution - SelvaDatasetHfSmoother: batch HF attenuation for codec compatibility - SelvaDatasetAugmenter: gain/pitch/time-stretch variants with npz origin tracking Improvements: - Inspector: silence detection (max_silence_fraction param) - Saver: origin_name lookup for augmented clips' npz pairing - LoRA trainer: latent_mixup_alpha + latent_noise_sigma regularization - LoRA trainer: one-time SR mismatch warning in _load_audio Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 11:32:00 +02:00
Ethanfel	30127c13ca	feat: add BigVGAN vocoder sweep scheduler node Runs a series of BigVGAN fine-tuning experiments from a JSON sweep file. Audio clips loaded once, vocoder deep-copied per experiment, results collected in experiment_summary.json with comparison loss curves. Resume-aware — skips completed experiments on re-run. Includes overnight sweep config (8 experiments): snake alpha steps, GAFilter ablation, phase loss weight, discriminator FM, all_params. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 02:39:56 +02:00
Ethanfel	4226297735	chore: remove debug VRAM logging Training confirmed working — VRAM usage is normal backward-pass activation memory, not a leak. Removed all debug _vram_log and _vram calls. Kept the video_enc offload and torch.cuda.empty_cache fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:50:08 +02:00
Ethanfel	4297715a08	debug: add driver-level VRAM reporting + offload video_enc torch.cuda.memory_allocated only tracks PyTorch allocator. Added torch.cuda.mem_get_info to see actual CUDA driver memory usage. Also offload video_enc (TextSynch) which was missed in the original offload — stays on GPU when strategy != offload_to_cpu. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:48:04 +02:00
Ethanfel	9af4bbdd91	fix: force torch.cuda.empty_cache() after pre-generation and CLIP encoding PyTorch's caching allocator reserves GPU memory from pre-generation (~90 GiB for generator + tod) and doesn't return it to CUDA/OS. soft_empty_cache may not call torch.cuda.empty_cache(). Force a full cache release after CLIP encoding and after LoRA mel pre-generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:42:45 +02:00
Ethanfel	89d6fccd28	debug: add per-operation VRAM logging in first training step Logs VRAM at: after target_mel, after vocoder forward, before loss, after loss computation, and after backward. Only logs for step 0 to avoid spam. Will identify which operation causes the 94 GiB spike. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:35:54 +02:00
Ethanfel	bd84242fa1	debug: add VRAM logging at offload and training checkpoints Logs torch.cuda.memory_allocated/reserved at each step: before unload, after unload_all_models, after feature_utils.to(cpu), after generator to(cpu), after cache clear, after mel_converter to(device), and before training loop. This will identify what's holding VRAM. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:28:31 +02:00
Ethanfel	5a2c003fb2	fix: move baseline sample after inference flag stripping _save_sample("baseline") was called before the vocoder's inference tensors were sanitized, causing "Inference tensors do not track version counter". Moved it after the clone/detach loop and vocoder.to(device). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:26:11 +02:00
Ethanfel	f8d4d77b0d	fix: pre-compute text CLIP embeddings in main thread to avoid inference tensor crash CLIP weights are inference tensors from ComfyUI loading. inference_mode is thread-local, so the worker thread can't use CLIP even with a context manager. Pre-compute all text embeddings in the main thread (where inference_mode IS active), clone+detach to normal tensors, and pass them to the worker via text_clip_cache dict. CLIP no longer needs to be on GPU during pre-generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:19:44 +02:00
Ethanfel	32e5344ea2	fix: wrap CLIP encoding in inference_mode during pre-generation CLIP weights are inference tensors from ComfyUI loading. The worker thread runs without inference_mode, so PyTorch rejects inference tensors in multi_head_attention_forward (version counter tracking). Wrap the encode_text_clip call in torch.inference_mode() since text encoding doesn't need gradients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 01:10:58 +02:00
Ethanfel	10a71b0c4f	fix: offload entire model to CPU in main thread before worker starts The previous offload ran inside the worker thread, but by then ComfyUI had already loaded the full model to GPU. Now feature_utils.to('cpu') and generator.to('cpu') run in the main thread right after unload_all_models(), before the worker starts. vocoder.to(device, dtype) is called explicitly after inference flag stripping in _do_train to bring only the vocoder back to GPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:56:13 +02:00
Ethanfel	37a27160aa	fix: match mel dtype to vocoder in baseline sample generation ref_mel is float32 (from mel_converter) but vocoder weights are bfloat16 before inference flag stripping. Cast mel to vocoder's dtype to prevent input/bias type mismatch during baseline sample save. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:45:31 +02:00
Ethanfel	cb9a1eef01	fix: stop loading full feature_utils to GPU before training feature_utils.to(device) was loading CLIP ViT-H, synchformer, T5, VAE, and vocoder (~90 GiB) to GPU for the entire training run. Now only mel_converter (tiny) is moved to GPU. Pre-generation manages its own device placement: temporarily moves CLIP and tod to GPU, then moves them back when done. This frees ~90 GiB for the backward pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:44:38 +02:00
Ethanfel	d70c611bf7	fix: offload CLIP, synchformer, T5, generator, VAE to CPU before training Only the vocoder and mel_converter are needed during BigVGAN training. The rest of the SelVA pipeline (CLIP ViT-H, synchformer, T5, generator, VAE) was staying on GPU and consuming ~90 GiB, leaving no room for backward pass activations. Now offloaded individually to CPU before the training loop starts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:33:07 +02:00
Ethanfel	4e6cc4d519	feat: cache pre-generated LoRA mels to disk for reuse LoRA mel pre-generation runs a full ODE+CFG for every clip, which is slow. Cache results to a .pt file next to the output, keyed by a SHA-256 hash of the LoRA adapter content + generation parameters (seed, steps, CFG, duration, sample rate, npz file list). Automatically reused on subsequent runs when parameters haven't changed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:30:20 +02:00
Ethanfel	0854bd2638	fix: cast discriminators to model dtype to match vocoder output Discriminators are constructed as float32 but receive bfloat16 tensors from the vocoder. Cast to model dtype on load to prevent conv dtype mismatch in feature matching loss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:25:04 +02:00
Ethanfel	187b2e3169	fix: cast GAFilter to model dtype after injection GAFilter conv weights are created as float32 but the rest of the vocoder is bfloat16. vocoder.to(device) missed the dtype cast, causing conv1d dtype mismatch when Snake bfloat16 output flows into GAFilter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:24:11 +02:00
Ethanfel	608746ce7b	fix: cast input mel to model dtype before vocoder forward pass mel_converter outputs float32 (cuFFT requirement) but vocoder weights are bfloat16 from model loading. Cast input_mel back to model dtype before feeding the vocoder to avoid conv1d dtype mismatch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:18:05 +02:00
Ethanfel	bba5aec7a5	fix: add CFG to LoRA mel pre-generation to match inference conditions Pre-generated mels were using a bare forward pass with no classifier-free guidance, producing mels that don't match what the vocoder sees at inference (where cfg_strength=4.5 is the default). Now uses ode_wrapper with preprocess_conditions/get_empty_conditions, same as the sampler node. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:17:16 +02:00
Ethanfel	d06936802b	fix: cast mel_converter buffers to float32 to match STFT input dtype mel_basis and hann_window buffers inherit bfloat16 from model loading. Since all mel_converter inputs are cast to float32 for cuFFT, the internal buffers must also be float32 to avoid matmul dtype mismatch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:10:52 +02:00
Ethanfel	bee518a855	fix: cast all STFT inputs to float32 to prevent cuFFT bfloat16 crash cuFFT does not support bfloat16 tensors. When the model is loaded in bfloat16, all torch.stft calls (mel_converter, discriminator spectrogram, multi-resolution STFT loss) crash. Add .float() at every STFT boundary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 23:53:36 +02:00
Ethanfel	48b72c0be0	feat: add LoRA mel pre-generation to BigVGAN vocoder trainer When a lora_adapter path is provided, the trainer pre-generates LoRA-distorted mels for each training clip (full ODE generation + VAE decode) and trains the vocoder to produce clean audio from them. This teaches the vocoder to compensate for LoRA latent distribution shift without requiring perfectly aligned training pairs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 23:26:36 +02:00
Ethanfel	e16480b4c9	feat: add PiSSA/rsLoRA support to scheduler and PiSSA sweep experiment Thread init_mode and use_rslora through the scheduler's config parsing, experiment record, and _train_inner call. Default alpha changed to 2*rank to match trainer. Add pissa_sweep.json with 7 experiments ablating PiSSA init vs standard, rsLoRA scaling, and learning rate variations at rank 128. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 22:07:27 +02:00
Ethanfel	784fb2753f	feat: PiSSA init, rsLoRA scaling, Spectral Surgery, and training fixes LoRA quality improvements addressing intruder dimension problem: 1. PiSSA initialization (arXiv:2404.02948): init A,B from top-r SVD of pretrained weight. Starts on-manifold, eliminates intruder dimensions at init. Base weight stores residual W_res = W - B@Ascale. 2. rsLoRA scaling (arXiv:2312.03732): alpha/sqrt(rank) instead of alpha/rank. Prevents gradient collapse at high ranks (128+). 3. Post-training Spectral Surgery (arXiv:2603.03995): SVD of trained LoRA update, gradient-sensitivity reweighting to suppress remaining intruder dimensions. Runs automatically after training completes. 4. alpha default changed to 2rank (was 1*rank). Produces fewer intruder dimensions per arXiv:2410.21228. 5. weight_decay reduced from 1e-2 to 0.0 (standard for LoRA, prevents erasing learned style weights). 6. random.choices replaced with random.sample when batch_size <= dataset size (eliminates duplicate samples per batch). PiSSA checkpoints include base weights (residual). Loader/evaluator updated to handle both standard and PiSSA checkpoint formats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 21:54:36 +02:00
Ethanfel	ecf828b007	fix: move vocoder to correct device after GAFilter injection inject_gafilters creates Conv1d modules on CPU. load_state_dict preserves existing param devices but GAFilter params stay on CPU, causing device mismatch during vocode. Save target device before injection, then move entire vocoder after loading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 20:28:55 +02:00
Ethanfel	793368af18	fix: strip inference flag before unnormalize in LoRA trainer eval x1_pred is an inference tensor (computed from inference-mode weights loaded by ComfyUI). generator.unnormalize() uses in-place mul_/add_ which fails on inference tensors. Clone strips the flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 20:01:53 +02:00
Ethanfel	1d1ae61409	fix: move only VAE+vocoder to GPU during eval to prevent device mismatch The previous check (next(feature_utils_orig.parameters()).device) only inspected the first parameter (from CLIP), missing CPU-stranded vocoder weights when the module was in a mixed-device state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 19:36:02 +02:00
Ethanfel	8fa2699551	fix: correct DITTO reference latent space mismatch References were stored in normalized flow-matching space (net_generator.normalize(z_sample)) but the style loss compares against unnormalize(x) which is in VAE latent space. The optimizer was minimizing L1 between tensors at different scales, pushing the ODE endpoint out of distribution and producing noise. Fix: store reference latents in VAE space (z_sample directly) so both ref_mean/ref_gram and x_un are in the same coordinate system. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:57:08 +02:00
Ethanfel	14fabf01f9	fix: reduce opt_lr step to 0.001 to allow finer lr control in DITTO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:40:21 +02:00
Ethanfel	445da1e69b	fix: replace std clamp with anchor regularization to prevent OOD noise The std clamp was post-hoc and only addressed magnitude, not direction. x0 was drifting to mean=-0.55/std=3.1 (ODE expected mean=0/std=1). Replace with anchor_weight * MSE(x0, x0_init) added directly to the loss. The optimizer now balances style matching against staying near the initial N(0,1) noise — gradient-aware, prevents both magnitude and mean drift. Also logs style/anchor losses and x0_std per step for diagnostics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:30:05 +02:00
Ethanfel	fa6c4fa834	fix: clamp x0 std after each optimizer step to prevent OOD noise Optimized x0 was reaching std=2.72 vs expected ~1.0 for flow matching. An out-of-distribution initial condition maps to white noise in the output. After each step, rescale x0 back toward unit std if it exceeds 1.5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:23:39 +02:00
Ethanfel	286681edff	fix: cast mel to model dtype before VAE encode in DITTO reference loading mel_converter outputs float32 (cuFFT requirement), but VAE encoder weights are bfloat16. Cast mel to dtype before encode to avoid type mismatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:18:41 +02:00
Ethanfel	056a7b973d	fix: enable VAE encoder in model loader — required for DITTO reference encoding need_vae_encoder=False was deleting the encoder to save a small amount of VRAM. DITTO now needs it to encode reference clips to latent space for style loss. The spectrogram VAE encoder is small enough that the overhead is negligible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:15:27 +02:00
Ethanfel	633fe36fbb	fix: compute DITTO style loss in latent space to eliminate VAE decoder noise Root cause of white noise: backpropagating through vae.decode produces unstable gradients — the VAE decoder was designed for inference only. Fix: encode reference clips to VAE latent space once (no grad), compute mean + Gram matrix statistics there, and compute style loss directly on net_generator.unnormalize(x) — a single differentiable linear operation. The gradient path is now: loss → x (unnormalized) → ODE → x0, with no decoder in the backward pass. Also adds VAE encoder availability check (fails cleanly if encoder was deleted to save VRAM). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:12:31 +02:00
Ethanfel	8862089fd0	fix: remove 32-clip cap on DITTO reference loading — use all available clips Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:10:10 +02:00
Ethanfel	608e7df04b	feat: add gram_weight param to DITTO, reduce default style_weight to 0.1 White noise on output was caused by the Gram matrix loss pushing the latent into incoherent regions. Now gram_weight defaults to 0 (mean spectrum only) and style_weight defaults to 0.1 instead of 1.0. Users can enable Gram gradually once mean-only optimization converges cleanly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 18:03:32 +02:00
Ethanfel	101b1bdb41	fix: _do_optimize returns dict not tuple — prevent double-wrapping AUDIO output optimize() does return (_result[0],) to wrap for ComfyUI. _do_optimize was returning (dict,) instead of dict, causing double-wrapping: ((dict,),). ComfyUI then received a tuple as audio and failed on audio["waveform"]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:56:59 +02:00
Ethanfel	732df151b0	fix: cast ref_mean/ref_gram to model dtype before loss computation ref_mean and ref_gram are float32 (mel computed via cuFFT which requires float32). mel_gen is bfloat16. F.l1_loss(bfloat16, float32) promotes to float32, producing a float32 loss. loss.backward() then pushes float32 gradients through bfloat16 ops → 'Found dtype Float but expected BFloat16'. Fix: clone().detach().to(dtype) at the start of _do_optimize. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:48:41 +02:00
Ethanfel	817b75df49	fix: bypass @torch.inference_mode() on decode to preserve gradient chain feature_utils.decode and autoencoder.decode are both decorated with @torch.inference_mode(), which unconditionally destroys grad_fn on all outputs — making loss.backward() fail with 'does not require grad'. Fix: call feature_utils.tod.vae.decode() directly, which has no decorator and is fully differentiable. Transpose matches the original wrapper signature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:44:35 +02:00
Ethanfel	1f02d73a3e	fix: remove checkpoint wrapper on decode — direct call preserves grad chain _unnorm_decode was wrapped in checkpoint(use_reentrant=False) to avoid saving inference-mode weight tensors during backward. Since _strip_inference() now cleans all params/buffers before any forward pass, the checkpoint is no longer needed and was silently breaking the gradient chain from mel_gen back to x0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:40:00 +02:00
Ethanfel	fb255edaf0	fix: strip inference-mode tensor flags in DITTO before conditions computation Root cause: net_generator/feature_utils/mel_converter parameters were loaded in ComfyUI's inference_mode; operations on inference tensors propagate the flag, so conditions computed from tainted weights were also tainted. checkpoint() with use_reentrant=False then failed trying to save inference tensors during the backward recompute pass. Fix: _strip_inference() clones all params/buffers of all three models before any forward pass, and _clone_nested() cleans any residual inference flags in the conditions/empty_conditions output tensors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:35:15 +02:00
Ethanfel	8ccc2438e4	fix: remove FlashSR (audiosr incompatible with Python 3.12), add training loss CSV - Drop SelvaFlashSR node — audiosr pins numpy<=1.23.5 which cannot build on Python 3.12 (pkgutil.ImpImporter removed); use Saganaki22/ComfyUI-AudioSR instead - BigVGAN trainer now writes <output_stem>_training_log.csv alongside the checkpoint: step, total, fm, mel, stft, phase, l2sp columns, line-buffered so loss can be tailed live during training Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:18:34 +02:00
Ethanfel	8371466e44	fix: guarantee length preservation in _ActivationWithGAFilter Activation1d's anti-alias Kaiser sinc resampling (asymmetric pad_left / pad_right) can produce ±1-2 sample rounding in edge cases, causing the BigVGAN AMPBlock residual addition (xt + x) to fail with a size mismatch. Trim or pad the output to exactly match the input length so the resblock skip connection always has matching dimensions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:39:03 +02:00
Ethanfel	ba0499b77c	fix: FlashSR device handling and remove unused tmp_out Use device="auto" for audiosr.build_model — safer than passing a device string that may not be accepted in all audiosr versions. Remove unused tmp_out temp file that was created but never written to. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:32:02 +02:00
Ethanfel	ce62bccc1f	feat: add post-generation audio enhancement nodes Three new nodes for post-generation quality improvement: - SelvaHarmonicExciter: multi-band exciter (HPF → tanh saturation → mix) restores harmonic richness lost in BigVGAN HF reconstruction - SelvaFlashSR: audio super-resolution via FlashSR basic model (haoheliu/versatile_audio_super_resolution, requires pip install audiosr) predicts missing HF content above vocoder reconstruction ceiling - SelvaOutputNormalizer: BS.1770-4 LUFS normalization + true peak limiting for consistent loudness on generated outputs (pyloudnorm) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:27:39 +02:00
Ethanfel	45fced55bc	fix: exclude GAFilter params from L2-SP regularization L2-SP anchors trainable params to their pretrained values. GAFilter is a newly initialized module (identity FIR filter) with no pretrained values — anchoring it to identity initialization would resist learning. Exclude gafilter params from the L2-SP loss so they train freely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:19:52 +02:00
Ethanfel	db112394e8	feat: add AF-Vocoder GAFilter to BigVGAN trainer and loader Implements AF-Vocoder GAFilter (Interspeech 2025): learnable per-channel depthwise FIR filter inserted after each Snake/Activation1d in BigVGAN residual blocks. Initialized as identity so training starts from pretrained behaviour. - inject_gafilters() walks resblocks.*.activations and wraps each Activation1d with _ActivationWithGAFilter — weights appear in vocoder.state_dict() automatically - Trained alongside Snake alphas in snake_alpha_only mode - Checkpoint saves has_gafilter + gafilter_kernel_size metadata - Loader detects metadata and injects before load_state_dict so weights populate correctly - Controlled by use_gafilter (default True) and gafilter_kernel_size (default 9) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:15:14 +02:00
Ethanfel	c53ea5517c	feat: add FA-GAN phase-aware STFT loss to BigVGAN trainer Adds L1 loss on real, imaginary, and magnitude STFT components across three resolutions (FA-GAN, arXiv:2407.04575). Penalizes phase smearing directly — magnitude-only losses cannot distinguish correct spectrum with wrong phase from a smeared spectrum. Controlled by lambda_phase (default 1.0, 0 = disabled). Applied on top of both the discriminator FM path and the fallback mel+STFT path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 16:09:31 +02:00

1 2 3 4 5 ...

279 Commits