- Load fresh FeaturesUtils only for encoding; use model["feature_utils"] for
decode+vocode to mirror the exact path the sampler takes
- Apply generator.normalize() → unnormalize() around the encoded latent so the
decoder receives latents in the same space it expects from inference
- Log both encoded and norm→unnorm latent stats to diagnose round-trip fidelity
- Normalize output to -27 dBFS (matching training clip RMS) and clamp to [-1, 1]
to prevent clipping artifacts in the output waveform
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this the decoder produced 7s instead of 8s due to STFT rounding.
Same fix as _prepare_dataset uses for training data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Encodes audio through the VAE then decodes straight back, bypassing the
diffusion model entirely. Use this to isolate whether saturation artifacts
are introduced by the codec reconstruction (VAE/DAC) or by the LoRA.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>