9af4bbdd91
PyTorch's caching allocator reserves GPU memory from pre-generation (~90 GiB for generator + tod) and doesn't return it to CUDA/OS. soft_empty_cache may not call torch.cuda.empty_cache(). Force a full cache release after CLIP encoding and after LoRA mel pre-generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>