ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	a315093743	feat: sync_strength control and temporal coverage diagnostic in sampler Adds sync_strength (0.0–3.0, default 1.0) to PrismAudioSampler. The scale is applied post-conditioner (after Sync_MLP) to the conditioning tensor before it enters the DiT. Since CFG always uses zeros as the null sync embedding, this cleanly scales the sync guidance signal: effective_sync_guidance = cfg_scale * (sync_strength * cond - 0) Higher values tighten temporal audio-video alignment; 0.0 disables sync guidance entirely (audio conditioned only by video + text features). Not applied in T2A mode where sync is replaced by the learned empty_sync_feat. Also logs sync temporal coverage vs audio target duration, with a warning when they differ by more than 0.5s (stale or mismatched features). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 16:23:41 +01:00
Ethanfel	5b62be0447	chore: update default steps=100 and cfg_scale=7.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 11:03:48 +01:00
Ethanfel	abd315092b	feat: auto-use video duration from features when duration=0 Setting duration to 0 in PrismAudioSampler now reads the duration stored in the PRISMAUDIO_FEATURES dict (set by the feature extractor). Default changed from 10.0 to 0.0 so V2A workflows are wired up automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 11:00:47 +01:00
Ethanfel	c38df8c6fa	chore: remove debug options and diagnostic logging Remove debug_zero_video/debug_zero_sync inputs from PrismAudioSampler, DIT velocity diagnostics, conditioner stats logging, and feature stats prints from both sampler.py and text_only.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 10:47:00 +01:00
Ethanfel	1d8b9b59e0	debug: add DIT velocity diagnostic at t=1 to isolate DIT vs VAE quality issue Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 23:57:03 +01:00
Ethanfel	8bf4a0c3fc	debug: log conditioner output stats and T2A text feature stats Add per-key conditioning output stats (after Cond_MLP/Sync_MLP, after _substitute_empty_features) to both sampler and text_only nodes. Also add raw T5 text feature stats in T2A before conditioning. This lets us directly compare: - T2A vs V2A conditioning outputs to find which path differs - T2A vs npz text feature ranges Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 22:39:44 +01:00
Ethanfel	477fe0f08f	debug: add latent and audio stats logging to V2A sampler Match the diagnostic output already in text_only.py to compare V2A vs T2A latent distributions and diagnose conditioning issues. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 22:28:08 +01:00
Ethanfel	c0b7ccbcee	fix: substitute empty_clip_feat for video features when no video present Zero features through bias-free Cond_MLP produce near-zero activations, not the learned null signal the model was trained with. Use empty_clip_feat (the learned null video embedding) just like empty_sync_feat for sync. Also improve text_prompt tooltip to encourage detailed CoT descriptions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 22:13:22 +01:00
Ethanfel	83a7f2787b	feat: add debug_zero_video/sync toggles and feature stats logging to sampler Allows isolating which feature set causes quality issues: - debug_zero_video: zero video_features → text+sync only - debug_zero_sync: zero sync_features → text+video only Also logs mean/std/shape for all three feature tensors on every run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 21:40:34 +01:00
Ethanfel	3d62688e8c	feat: PrismAudioSampler node with correct metadata format and peak normalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:07:33 +01:00

10 Commits