fix: DITTO gradient never reached x0, remove unused imports and dead code

DITTO critical bug: x was reassigned on every ODE step, so by the time loss.backward() ran, x pointed to the final output tensor (grad_fn, not a leaf) and x.grad was always None. The manual gradient transfer never fired — x0 was never updated. The optimization was a no-op. Fix: use a straight-through estimator after the no-grad prefix: x = x + (x0 - x0.detach()) This adds zero value but creates a grad_fn back to x0, so backward() propagates ∂loss/∂x (at the Phase-1/2 boundary) directly to x0.grad. Equivalent to truncated BPTT with ∂x_prefix/∂x0 ≈ I. Also remove unused imports (SelvaSampler, _inject_tokens, random) that caused cascade ImportError risk, and remove dead trainable_count variable in BigVGAN trainer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 12:10:02 +02:00
parent 1e9551152e
commit 211494a91c
2 changed files with 10 additions and 35 deletions
@@ -404,11 +404,6 @@ class SelvaBigvganTrainer:
                f"[BigVGAN] No usable clips found (need audio >= {segment_seconds}s)"
            )

-        trainable_count = sum(
-            1 for n, _ in vocoder.named_parameters() if "alpha" in n
-        ) if train_mode == "snake_alpha_only" else sum(
-            1 for _ in vocoder.parameters()
-        )
        print(f"[BigVGAN] {len(clips)} clips ready  mode={train_mode}  "
              f"segment={segment_seconds}s  steps={steps}  lr={lr}  "
              f"batch={batch_size}  lambda_l2sp={lambda_l2sp}\n", flush=True)