fix: DITTO gradient never reached x0, remove unused imports and dead code
DITTO critical bug: x was reassigned on every ODE step, so by the time loss.backward() ran, x pointed to the final output tensor (grad_fn, not a leaf) and x.grad was always None. The manual gradient transfer never fired — x0 was never updated. The optimization was a no-op. Fix: use a straight-through estimator after the no-grad prefix: x = x + (x0 - x0.detach()) This adds zero value but creates a grad_fn back to x0, so backward() propagates ∂loss/∂x (at the Phase-1/2 boundary) directly to x0.grad. Equivalent to truncated BPTT with ∂x_prefix/∂x0 ≈ I. Also remove unused imports (SelvaSampler, _inject_tokens, random) that caused cascade ImportError risk, and remove dead trainable_count variable in BigVGAN trainer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -404,11 +404,6 @@ class SelvaBigvganTrainer:
|
||||
f"[BigVGAN] No usable clips found (need audio >= {segment_seconds}s)"
|
||||
)
|
||||
|
||||
trainable_count = sum(
|
||||
1 for n, _ in vocoder.named_parameters() if "alpha" in n
|
||||
) if train_mode == "snake_alpha_only" else sum(
|
||||
1 for _ in vocoder.parameters()
|
||||
)
|
||||
print(f"[BigVGAN] {len(clips)} clips ready mode={train_mode} "
|
||||
f"segment={segment_seconds}s steps={steps} lr={lr} "
|
||||
f"batch={batch_size} lambda_l2sp={lambda_l2sp}\n", flush=True)
|
||||
|
||||
Reference in New Issue
Block a user