8166c56552
BigVGAN's 512x upsampling stack stores huge intermediate activations for backward even in snake_alpha_only mode (only 5K trainable params, but activation graph runs through the full network after each snake op). Wrapping vocoder() in checkpoint(use_reentrant=False) recomputes activations during backward instead of storing them — ~2x compute cost, large reduction in peak VRAM. Should allow batch_size > 1 on 96 GB without OOM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>