ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	b1a2ee594e	fix: correct VideoPrism import (videoprism.models, not videoprism); add flax dep videoprism/__init__.py is empty — API lives in videoprism.models. Fix: from videoprism import models as vp (not import videoprism as vp). Also add flax to managed venv packages (required by videoprism Flax model). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:38:00 +01:00
Ethanfel	0f46e8359d	feat: switch managed venv to jax[cuda13] for GPU feature extraction RTX 6000 Pro (Blackwell SM 10.0) fully supports CUDA 13. Switch from jax[cpu]+jaxlib to jax[cuda13] which bundles jaxlib and uses pip-managed CUDA libraries. Delete _extract_env to force a rebuild. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:33:45 +01:00
Ethanfel	06f8dbbab4	feat: add hf_token input and HF_TOKEN env forwarding to feature extractor google/t5gemma-l-l-ul2-it is a gated HuggingFace model requiring auth. Add optional hf_token input on the node; forward it (plus the legacy HUGGING_FACE_HUB_TOKEN alias) to the subprocess env. Falls back to HF_TOKEN from the host environment. Warn clearly when neither is set. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:27:33 +01:00
Ethanfel	a6d584bd34	fix: treat empty python_env as auto-managed venv trigger Empty string from clearing the node field caused subprocess to execute '' which raises PermissionError. Now any blank or 'python' value uses the auto-installed venv. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:21:16 +01:00
Ethanfel	829f398ed0	feat: verbose step-by-step logging in feature extraction - extract_features.py: 6 numbered steps with shapes, fps, frame counts - feature_extractor.py: stream subprocess output live (capture_output=False) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:19:38 +01:00
Ethanfel	878025450a	feat: add data_utils package with FeaturesUtils implementation Creates data_utils/v2a_utils/feature_utils_288.py with FeaturesUtils: - T5-Gemma text encoding via transformers - VideoPrism video encoding via JAX videoprism package - Synchformer visual encoder loading from checkpoint Also fixes extract_features.py to add plugin root to sys.path so data_utils is importable in the subprocess venv. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:14:34 +01:00
Ethanfel	f32456a142	feat: add fps input to PrismAudioFeatureExtractor Exposes the video frame rate as an optional input (default 30). Correct FPS ensures accurate temporal frame sampling in VideoPrism and Synchformer feature extraction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:08:10 +01:00
Ethanfel	c416045ace	fix: replace torchvision.io.write_video with PIL+ffmpeg write_video requires the optional 'av' (PyAV) package. Use PIL to save frames as PNGs then combine with ffmpeg, which is always present in ComfyUI Docker images. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:03:39 +01:00
Ethanfel	824550bed3	feat: verbose per-package progress during venv auto-install Installs each package individually with [n/total] counters and pip progress bars, so failures pinpoint the exact failing package. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 20:00:04 +01:00
Ethanfel	8f2e204146	fix: show pip output, handle incomplete venv, fix TF version for Python 3.12 - tensorflow-cpu==2.15.0 only supports Python <=3.11; relax to >=2.16.0 - capture_output=False so pip errors are visible in ComfyUI logs - clean up incomplete venv dir before retrying install Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:55:55 +01:00
Ethanfel	8e3ab999f0	fix: load VAE state dict with strict=False vae.ckpt is a full training checkpoint containing discriminator, STFT loss modules, and EMA wrappers that are absent from the inference AudioAutoencoder. strict=False ignores these training-only keys while still loading all encoder/decoder/bottleneck weights correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:51:51 +01:00
Ethanfel	afc7d5b657	fix: add missing runtime dependencies to requirements.txt einops-exts, vector-quantize-pytorch, scipy were imported by prismaudio_core but not listed in requirements.txt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:48:33 +01:00
Ethanfel	e372cdc488	fix: add plugin root to sys.path so prismaudio_core is importable ComfyUI does not add the custom node directory to sys.path automatically, so prismaudio_core (a package inside the plugin dir) was not found at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:41:11 +01:00
Ethanfel	7671d296fa	fix: remove spurious caption_cot input entry from video_to_audio workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:39:05 +01:00
Ethanfel	3894fcc9b4	feat: add demo workflows for text-to-audio and video-to-audio Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:32:24 +01:00
Ethanfel	35d0615253	feat: auto-install pip venv for feature extraction on first use PrismAudioFeatureExtractor now creates and populates a managed venv (_extract_env/) automatically when python_env is left as the default 'python'. Also adds scripts/install_extract_env.sh for manual/Docker setup without conda. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:27:27 +01:00
Ethanfel	9b1cb71b2a	fix: remove MMDiTWrapper import and dead code paths from factory.py MMDiTWrapper was removed from diffusion.py during cleanup but the import in factory.py was missed, causing ImportError on every model load. Also stub wavelet and diffusion_prior paths that reference deleted modules. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 19:12:40 +01:00
Ethanfel	807f00417f	docs: README with installation and usage instructions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:15:17 +01:00
Ethanfel	618e7de64b	feat: PrismAudioTextOnly node with correct T5-Gemma encoding Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:09:11 +01:00
Ethanfel	3d62688e8c	feat: PrismAudioSampler node with correct metadata format and peak normalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:07:33 +01:00
Ethanfel	7c54ee8482	feat: PrismAudioFeatureExtractor node with subprocess bridge and conda env Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:06:10 +01:00
Ethanfel	3f35aa39f2	feat: PrismAudioFeatureLoader node for pre-computed .npz files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:04:32 +01:00
Ethanfel	1043f4bacb	feat: PrismAudioModelLoader node with auto-download and adaptive VRAM Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:02:47 +01:00
Ethanfel	8b634923dd	fix: remove unused tqdm import from sampling.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 18:01:29 +01:00
Ethanfel	87bea21d49	feat: extract prismaudio_core inference with callback-enabled sampling Add inference subpackage with: - sampling.py: sample_discrete_euler modified from upstream to add callback parameter for ComfyUI progress bars (uses enumerate for step index) - utils.py: set_audio_channels and prepare_audio for audio preprocessing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 17:59:37 +01:00
Ethanfel	30e85f0f99	fix: resolve critical bugs and quality issues in prismaudio_core/models Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 17:56:02 +01:00
Ethanfel	6e1186d5bd	fix: clean up dead code paths and debug artifacts in prismaudio_core/models Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 17:49:57 +01:00
Ethanfel	84c81e0e55	feat: extract prismaudio_core model modules (DiT, conditioners, VAE, diffusion) Fetch and adapt inference-critical model modules from upstream PrismAudio repo: - dit.py: DiffusionTransformer with debug prints removed - diffusion.py: ConditionedDiffusionModelWrapper, DiTWrapper, MMDiTWrapper - conditioners.py: Cond_MLP, Sync_MLP, MultiConditioner with stubbed training imports - autoencoders.py: AudioAutoencoder, OobleckEncoder/Decoder - transformer.py: ContinuousTransformer, Attention with flash_attn fallback to SDPA - blocks.py, utils.py, bottleneck.py, pretransforms.py, local_attention.py, pqmf.py - adp.py: UNetCFG1d, UNet1d, NumberEmbedder - mmmodules/model/low_level.py: MLP, ChannelLastConv1d, ConvMLP All internal imports rewritten from PrismAudio.* to prismaudio_core.*, training-only imports stubbed, flash_attn made optional with HAS_FLASH_ATTN flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 17:31:22 +01:00
Ethanfel	b60ff4111b	feat: extract prismaudio_core config and model factory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 17:05:57 +01:00
Ethanfel	baa80de194	feat: project scaffolding with shared utils and node registration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 16:59:21 +01:00
Ethanfel	c9364c4ec2	docs: initial design and implementation plan	2026-03-27 16:57:15 +01:00

1 2 3

131 Commits