Commit Graph

64 Commits

Author SHA1 Message Date
Ethanfel a6d584bd34 fix: treat empty python_env as auto-managed venv trigger
Empty string from clearing the node field caused subprocess to execute ''
which raises PermissionError. Now any blank or 'python' value uses the
auto-installed venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:21:16 +01:00
Ethanfel 829f398ed0 feat: verbose step-by-step logging in feature extraction
- extract_features.py: 6 numbered steps with shapes, fps, frame counts
- feature_extractor.py: stream subprocess output live (capture_output=False)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:19:38 +01:00
Ethanfel f32456a142 feat: add fps input to PrismAudioFeatureExtractor
Exposes the video frame rate as an optional input (default 30).
Correct FPS ensures accurate temporal frame sampling in VideoPrism
and Synchformer feature extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:08:10 +01:00
Ethanfel c416045ace fix: replace torchvision.io.write_video with PIL+ffmpeg
write_video requires the optional 'av' (PyAV) package. Use PIL to save
frames as PNGs then combine with ffmpeg, which is always present in
ComfyUI Docker images.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:03:39 +01:00
Ethanfel 824550bed3 feat: verbose per-package progress during venv auto-install
Installs each package individually with [n/total] counters and
pip progress bars, so failures pinpoint the exact failing package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:00:04 +01:00
Ethanfel 8f2e204146 fix: show pip output, handle incomplete venv, fix TF version for Python 3.12
- tensorflow-cpu==2.15.0 only supports Python <=3.11; relax to >=2.16.0
- capture_output=False so pip errors are visible in ComfyUI logs
- clean up incomplete venv dir before retrying install

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:55:55 +01:00
Ethanfel 8e3ab999f0 fix: load VAE state dict with strict=False
vae.ckpt is a full training checkpoint containing discriminator, STFT
loss modules, and EMA wrappers that are absent from the inference
AudioAutoencoder. strict=False ignores these training-only keys while
still loading all encoder/decoder/bottleneck weights correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:51:51 +01:00
Ethanfel 35d0615253 feat: auto-install pip venv for feature extraction on first use
PrismAudioFeatureExtractor now creates and populates a managed venv
(_extract_env/) automatically when python_env is left as the default
'python'. Also adds scripts/install_extract_env.sh for manual/Docker
setup without conda.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:27:27 +01:00
Ethanfel 618e7de64b feat: PrismAudioTextOnly node with correct T5-Gemma encoding
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:09:11 +01:00
Ethanfel 3d62688e8c feat: PrismAudioSampler node with correct metadata format and peak normalization
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:07:33 +01:00
Ethanfel 7c54ee8482 feat: PrismAudioFeatureExtractor node with subprocess bridge and conda env
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:06:10 +01:00
Ethanfel 3f35aa39f2 feat: PrismAudioFeatureLoader node for pre-computed .npz files
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:04:32 +01:00
Ethanfel 1043f4bacb feat: PrismAudioModelLoader node with auto-download and adaptive VRAM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:02:47 +01:00
Ethanfel baa80de194 feat: project scaffolding with shared utils and node registration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 16:59:21 +01:00