d70c611bf7
Only the vocoder and mel_converter are needed during BigVGAN training. The rest of the SelVA pipeline (CLIP ViT-H, synchformer, T5, generator, VAE) was staying on GPU and consuming ~90 GiB, leaving no room for backward pass activations. Now offloaded individually to CPU before the training loop starts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>