debug: log conditioner output stats and T2A text feature stats

Add per-key conditioning output stats (after Cond_MLP/Sync_MLP, after
_substitute_empty_features) to both sampler and text_only nodes. Also
add raw T5 text feature stats in T2A before conditioning.

This lets us directly compare:
- T2A vs V2A conditioning outputs to find which path differs
- T2A vs npz text feature ranges

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-27 22:39:44 +01:00
parent 477fe0f08f
commit 8bf4a0c3fc
2 changed files with 14 additions and 0 deletions
+6
View File
@@ -88,6 +88,12 @@ class PrismAudioSampler:
if not has_video:
_substitute_empty_features(diffusion, conditioning, device, dtype)
# Log conditioner output stats for each key
for ck, cv in conditioning.items():
if isinstance(cv, (list, tuple)) and len(cv) >= 1 and isinstance(cv[0], torch.Tensor):
t = cv[0].float()
print(f"[PrismAudio] cond[{ck}]: shape={tuple(t.shape)} mean={t.mean():.3f} std={t.std():.3f} min={t.min():.3f} max={t.max():.3f}", flush=True)
# Assemble conditioning inputs for the DiT
cond_inputs = diffusion.get_conditioning_inputs(conditioning)