211494a91c
DITTO critical bug: x was reassigned on every ODE step, so by the time loss.backward() ran, x pointed to the final output tensor (grad_fn, not a leaf) and x.grad was always None. The manual gradient transfer never fired — x0 was never updated. The optimization was a no-op. Fix: use a straight-through estimator after the no-grad prefix: x = x + (x0 - x0.detach()) This adds zero value but creates a grad_fn back to x0, so backward() propagates ∂loss/∂x (at the Phase-1/2 boundary) directly to x0.grad. Equivalent to truncated BPTT with ∂x_prefix/∂x0 ≈ I. Also remove unused imports (SelvaSampler, _inject_tokens, random) that caused cascade ImportError risk, and remove dead trainable_count variable in BigVGAN trainer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>