Commit Graph

26 Commits

Author SHA1 Message Date
Ethanfel 477fe0f08f debug: add latent and audio stats logging to V2A sampler
Match the diagnostic output already in text_only.py to compare
V2A vs T2A latent distributions and diagnose conditioning issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:28:08 +01:00
Ethanfel c0b7ccbcee fix: substitute empty_clip_feat for video features when no video present
Zero features through bias-free Cond_MLP produce near-zero activations,
not the learned null signal the model was trained with. Use empty_clip_feat
(the learned null video embedding) just like empty_sync_feat for sync.
Also improve text_prompt tooltip to encourage detailed CoT descriptions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:13:22 +01:00
Ethanfel 45633788a4 debug: add latent and audio stats logging to T2A node
Print fakes latent stats (mean/std/min/max) and audio pre-norm stats
to diagnose whether diffusion output is numerically reasonable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:06:39 +01:00
Ethanfel 11457fc27a debug: fix VAE load_state_dict diagnostic — load into .model directly
AutoencoderPretransform.load_state_dict() doesn't return IncompatibleKeys.
Load into pretransform.model (AudioAutoencoder) to get the return value
and see actual missing/unexpected key counts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:56:06 +01:00
Ethanfel f2705b3063 debug: log weight load stats for diffusion and VAE checkpoints
Print key counts, missing/unexpected keys, and sample key names to
diagnose whether weights are actually loading correctly (strict=False
silently hides mismatches that would cause garbage audio output).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:53:25 +01:00
Ethanfel 83a7f2787b feat: add debug_zero_video/sync toggles and feature stats logging to sampler
Allows isolating which feature set causes quality issues:
- debug_zero_video: zero video_features → text+sync only
- debug_zero_sync: zero sync_features → text+video only
Also logs mean/std/shape for all three feature tensors on every run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:40:34 +01:00
Ethanfel 934a401633 perf: replace PIL+PNG frame files with direct ffmpeg stdin pipe
Stream raw RGB bytes from tensor directly to ffmpeg stdin.
Eliminates all intermediate PNG file I/O — much faster for large frame counts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:20:00 +01:00
Ethanfel b3ac9ab22f feat: log MP4 conversion time before subprocess spawn
Shows how long PIL+ffmpeg video export takes so we can see
if that's contributing to the gap before [extract] output appears.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:19:26 +01:00
Ethanfel 93120eb6b9 feat: auto-resolve synchformer checkpoint from prismaudio models dir
When synchformer_ckpt input is empty, look for synchformer_state_dict.pth
in the ComfyUI prismaudio models directory automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:49:56 +01:00
Ethanfel b1a2ee594e fix: correct VideoPrism import (videoprism.models, not videoprism); add flax dep
videoprism/__init__.py is empty — API lives in videoprism.models.
Fix: from videoprism import models as vp (not import videoprism as vp).
Also add flax to managed venv packages (required by videoprism Flax model).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:38:00 +01:00
Ethanfel 0f46e8359d feat: switch managed venv to jax[cuda13] for GPU feature extraction
RTX 6000 Pro (Blackwell SM 10.0) fully supports CUDA 13. Switch from
jax[cpu]+jaxlib to jax[cuda13] which bundles jaxlib and uses
pip-managed CUDA libraries. Delete _extract_env to force a rebuild.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:33:45 +01:00
Ethanfel 06f8dbbab4 feat: add hf_token input and HF_TOKEN env forwarding to feature extractor
google/t5gemma-l-l-ul2-it is a gated HuggingFace model requiring auth.
Add optional hf_token input on the node; forward it (plus the legacy
HUGGING_FACE_HUB_TOKEN alias) to the subprocess env. Falls back to
HF_TOKEN from the host environment. Warn clearly when neither is set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:27:33 +01:00
Ethanfel a6d584bd34 fix: treat empty python_env as auto-managed venv trigger
Empty string from clearing the node field caused subprocess to execute ''
which raises PermissionError. Now any blank or 'python' value uses the
auto-installed venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:21:16 +01:00
Ethanfel 829f398ed0 feat: verbose step-by-step logging in feature extraction
- extract_features.py: 6 numbered steps with shapes, fps, frame counts
- feature_extractor.py: stream subprocess output live (capture_output=False)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:19:38 +01:00
Ethanfel f32456a142 feat: add fps input to PrismAudioFeatureExtractor
Exposes the video frame rate as an optional input (default 30).
Correct FPS ensures accurate temporal frame sampling in VideoPrism
and Synchformer feature extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:08:10 +01:00
Ethanfel c416045ace fix: replace torchvision.io.write_video with PIL+ffmpeg
write_video requires the optional 'av' (PyAV) package. Use PIL to save
frames as PNGs then combine with ffmpeg, which is always present in
ComfyUI Docker images.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:03:39 +01:00
Ethanfel 824550bed3 feat: verbose per-package progress during venv auto-install
Installs each package individually with [n/total] counters and
pip progress bars, so failures pinpoint the exact failing package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:00:04 +01:00
Ethanfel 8f2e204146 fix: show pip output, handle incomplete venv, fix TF version for Python 3.12
- tensorflow-cpu==2.15.0 only supports Python <=3.11; relax to >=2.16.0
- capture_output=False so pip errors are visible in ComfyUI logs
- clean up incomplete venv dir before retrying install

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:55:55 +01:00
Ethanfel 8e3ab999f0 fix: load VAE state dict with strict=False
vae.ckpt is a full training checkpoint containing discriminator, STFT
loss modules, and EMA wrappers that are absent from the inference
AudioAutoencoder. strict=False ignores these training-only keys while
still loading all encoder/decoder/bottleneck weights correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:51:51 +01:00
Ethanfel 35d0615253 feat: auto-install pip venv for feature extraction on first use
PrismAudioFeatureExtractor now creates and populates a managed venv
(_extract_env/) automatically when python_env is left as the default
'python'. Also adds scripts/install_extract_env.sh for manual/Docker
setup without conda.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:27:27 +01:00
Ethanfel 618e7de64b feat: PrismAudioTextOnly node with correct T5-Gemma encoding
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:09:11 +01:00
Ethanfel 3d62688e8c feat: PrismAudioSampler node with correct metadata format and peak normalization
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:07:33 +01:00
Ethanfel 7c54ee8482 feat: PrismAudioFeatureExtractor node with subprocess bridge and conda env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:06:10 +01:00
Ethanfel 3f35aa39f2 feat: PrismAudioFeatureLoader node for pre-computed .npz files
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:04:32 +01:00
Ethanfel 1043f4bacb feat: PrismAudioModelLoader node with auto-download and adaptive VRAM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:02:47 +01:00
Ethanfel baa80de194 feat: project scaffolding with shared utils and node registration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 16:59:21 +01:00