fix: replace peak normalization with RMS normalization at -20 dBFS
Peak norm was slamming output to full scale regardless of content level, making generated audio several times louder than training clips. RMS norm to -20 dBFS matches typical processed audio level. Sampler exposes target_lufs (-40 to -6, default -20) for user control. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -134,8 +134,9 @@ def _eval_sample(generator, feature_utils_orig, dataset, seq_cfg, device, dtype,
|
||||
elif audio.dim() == 3 and audio.shape[1] != 1:
|
||||
audio = audio.mean(dim=1, keepdim=True)
|
||||
|
||||
peak = audio.abs().max().clamp(min=1e-8)
|
||||
audio = (audio / peak).clamp(-1, 1)
|
||||
target_rms = 10 ** (-20.0 / 20.0) # -20 dBFS
|
||||
rms = audio.pow(2).mean().sqrt().clamp(min=1e-8)
|
||||
audio = (audio * (target_rms / rms)).clamp(-1, 1)
|
||||
return audio.squeeze(0), seq_cfg.sampling_rate # [1, L]
|
||||
|
||||
except Exception as e:
|
||||
|
||||
Reference in New Issue
Block a user