ComfyUI-SelVA

Ethanfel/ComfyUI-SelVA

Fork 0

Commit Graph

Select branches

Hide Pull Requests

deprecated/lora-trainer

deprecated/prismaudio

experiment/crop-to-mask

feature/lora-timestep-sampling

feature/lora-training

main

f4a7292cde feat: add optional MASK input to SelVA Feature Extractor Ethanfel 2026-04-05 08:34:13 +02:00
bd53744e2d feat: comprehensive node improvements Ethanfel 2026-04-04 18:16:03 +02:00
429810db5b docs: improve tooltips on all three SelVA nodes Ethanfel 2026-04-04 18:10:05 +02:00
57f56c04e2 feat: update demo workflow with VHS_VideoCombine output Ethanfel 2026-04-04 18:07:56 +02:00
ff26d0b87d fix: bug sweep and improvements Ethanfel 2026-04-04 18:04:35 +02:00
83b1da9520 chore: remove all PrismAudio code from main branch Ethanfel 2026-04-04 17:58:31 +02:00
679a607a85 feat: wire prompt output from feature extractor to sampler in demo workflow Ethanfel 2026-04-04 17:13:23 +02:00
d495939367 docs: rewrite README for SelVA Ethanfel 2026-04-04 17:12:28 +02:00
982d66e078 chore: remove PrismAudio nodes from selva-integration branch Ethanfel 2026-04-04 17:01:21 +02:00
b4124f58b3 fix: BigVGANv2._from_pretrained() compat with newer huggingface_hub Ethanfel 2026-04-04 16:51:48 +02:00
2c9d521565 fix: 44k generator HF paths use 44khz suffix (not 44k) Ethanfel 2026-04-04 16:46:20 +02:00
28229d62ce fix: MD5 validation on existing files — re-download if corrupt Ethanfel 2026-04-04 16:42:38 +02:00
92593189f0 fix: use huggingface_hub for downloads instead of raw requests Ethanfel 2026-04-04 16:41:29 +02:00
614a2e02aa fix: weights_only=False for SelVA checkpoints (PyTorch 2.6 compat) Ethanfel 2026-04-04 16:38:31 +02:00
40388ba6de fix: negative_prompt inline (multiline:false) + VAE filename v1-44.pth not v1-44k.pth Ethanfel 2026-04-04 16:35:17 +02:00
789e09535d fix: SelvaSampler — negative_prompt above settings Ethanfel 2026-04-04 16:31:53 +02:00
4da4858e4a fix: inline prune helpers when removed from both transformers locations Ethanfel 2026-04-04 16:30:58 +02:00
ab8e1e5b7b feat: SelvaFeatureExtractor outputs prompt as STRING Ethanfel 2026-04-04 16:27:49 +02:00
e3a3384727 fix: SelvaSampler input order — prompt required, negative_prompt optional Ethanfel 2026-04-04 16:27:07 +02:00
9a985499e7 feat: auto-download SelVA weights on first use Ethanfel 2026-04-04 16:25:36 +02:00
27b4424e1a feat: prompt entered once in SelvaFeatureExtractor, reused by SelvaSampler Ethanfel 2026-04-04 16:22:59 +02:00
0e417f4078 fix: transformers compat — find_pruneable_heads_and_indices import Ethanfel 2026-04-04 16:21:26 +02:00
6474e2816c fix: two bugs in SelVA nodes Ethanfel 2026-04-04 15:39:57 +02:00
c23d210ab2 feat: SelVA video-to-audio example workflow Ethanfel 2026-04-04 15:31:53 +02:00
b59b657b6f feat: SelvaSampler — flow matching ODE with CFG and negative prompts Ethanfel 2026-04-04 15:31:18 +02:00
578b501d38 feat: SelvaFeatureExtractor — inline CLIP + TextSynchformer feature extraction Ethanfel 2026-04-04 15:23:40 +02:00
fe94438356 feat: SelvaModelLoader node — loads TextSynch + MMAudio + FeaturesUtils Ethanfel 2026-04-04 15:21:03 +02:00
6bc3fd6443 chore: vendor selva_core from jnwnlee/selva@d7d40a9 Ethanfel 2026-04-04 15:18:09 +02:00
0f60a9b2bf docs: add SelVA integration implementation plan deprecated/lora-trainer Ethanfel 2026-04-04 15:11:26 +02:00
51f93f9688 docs: SelVA integration design doc Ethanfel 2026-04-04 15:00:40 +02:00
a315093743 feat: sync_strength control and temporal coverage diagnostic in sampler Ethanfel 2026-03-28 16:23:41 +01:00
e49f760b77 fix: feature extractor CUDA detection, cache correctness, and short-video crash Ethanfel 2026-03-28 16:00:05 +01:00
4f40e15db3 fix: guard model cleanup in try/finally and fix DiTWrapper comments Ethanfel 2026-03-28 15:49:04 +01:00
08d73773c5 feat: LoRA trainer and loader nodes for PrismAudio DiT fine-tuning Ethanfel 2026-03-28 12:18:50 +01:00
762b19fd3a fix: return fps from non-cache extraction path deprecated/prismaudio Ethanfel 2026-03-28 11:26:15 +01:00
807a2e51fb docs: fix README references — PrismAudio not ThinkSound Ethanfel 2026-03-28 11:16:31 +01:00
67be94c45c chore: add updated V2A example workflow Ethanfel 2026-03-28 11:13:06 +01:00
681d230b0c chore: update T2A workflow to match V2A style and current defaults Ethanfel 2026-03-28 11:11:20 +01:00
62a3c5d0dc docs: rewrite README to reflect current node design Ethanfel 2026-03-28 11:10:07 +01:00
30631c0cb4 fix: change fps output type from INT to FLOAT Ethanfel 2026-03-28 11:05:35 +01:00
d0c9a72782 feat: add fps INT output to PrismAudioFeatureExtractor Ethanfel 2026-03-28 11:05:03 +01:00
5b62be0447 chore: update default steps=100 and cfg_scale=7.0 Ethanfel 2026-03-28 11:03:48 +01:00
abd315092b feat: auto-use video duration from features when duration=0 Ethanfel 2026-03-28 11:00:47 +01:00
972d379369 refactor: simplify feature extractor inputs Ethanfel 2026-03-28 10:55:08 +01:00
8969d407f6 feat: accept VHS_VIDEOINFO to auto-set fps in feature extractor Ethanfel 2026-03-28 10:52:51 +01:00
707ccb463e perf: replace MP4 encode/decode with lossless .npy frame transfer Ethanfel 2026-03-28 10:50:35 +01:00
c38df8c6fa chore: remove debug options and diagnostic logging Ethanfel 2026-03-28 10:47:00 +01:00
2f626d8a96 fix: use videoprism_lvt_public_v1_large with joint video-text forward Ethanfel 2026-03-28 10:37:02 +01:00
1d8b9b59e0 debug: add DIT velocity diagnostic at t=1 to isolate DIT vs VAE quality issue Ethanfel 2026-03-27 23:57:03 +01:00
8bf4a0c3fc debug: log conditioner output stats and T2A text feature stats Ethanfel 2026-03-27 22:39:44 +01:00
477fe0f08f debug: add latent and audio stats logging to V2A sampler Ethanfel 2026-03-27 22:28:08 +01:00
c0b7ccbcee fix: substitute empty_clip_feat for video features when no video present Ethanfel 2026-03-27 22:13:22 +01:00
45633788a4 debug: add latent and audio stats logging to T2A node Ethanfel 2026-03-27 22:06:39 +01:00
11457fc27a debug: fix VAE load_state_dict diagnostic — load into .model directly Ethanfel 2026-03-27 21:56:06 +01:00
f2705b3063 debug: log weight load stats for diffusion and VAE checkpoints Ethanfel 2026-03-27 21:53:25 +01:00
83a7f2787b feat: add debug_zero_video/sync toggles and feature stats logging to sampler Ethanfel 2026-03-27 21:40:34 +01:00
140cc5ee9a feat: implement real Synchformer visual encoder (TimeSformer ViT-B/16) Ethanfel 2026-03-27 21:28:20 +01:00
f99d2666e8 fix: interpolate sync_cond to match audio sequence length in transformer Ethanfel 2026-03-27 21:21:39 +01:00
934a401633 perf: replace PIL+PNG frame files with direct ffmpeg stdin pipe Ethanfel 2026-03-27 21:20:00 +01:00
b3ac9ab22f feat: log MP4 conversion time before subprocess spawn Ethanfel 2026-03-27 21:19:26 +01:00
ca87c41a2e feat: add per-step timing to feature extraction logs Ethanfel 2026-03-27 21:13:42 +01:00
63bd999dfa fix: switch to VideoPrism large (1024-dim) and fix Synchformer output shape Ethanfel 2026-03-27 21:07:17 +01:00
20fb766ad2 fix: cast tensors to float32 before numpy() in feature save Ethanfel 2026-03-27 20:56:52 +01:00
93120eb6b9 feat: auto-resolve synchformer checkpoint from prismaudio models dir Ethanfel 2026-03-27 20:49:56 +01:00
b1a2ee594e fix: correct VideoPrism import (videoprism.models, not videoprism); add flax dep Ethanfel 2026-03-27 20:38:00 +01:00
0f46e8359d feat: switch managed venv to jax[cuda13] for GPU feature extraction Ethanfel 2026-03-27 20:33:45 +01:00
06f8dbbab4 feat: add hf_token input and HF_TOKEN env forwarding to feature extractor Ethanfel 2026-03-27 20:27:33 +01:00
a6d584bd34 fix: treat empty python_env as auto-managed venv trigger Ethanfel 2026-03-27 20:21:16 +01:00
829f398ed0 feat: verbose step-by-step logging in feature extraction Ethanfel 2026-03-27 20:19:38 +01:00
878025450a feat: add data_utils package with FeaturesUtils implementation Ethanfel 2026-03-27 20:14:34 +01:00
f32456a142 feat: add fps input to PrismAudioFeatureExtractor Ethanfel 2026-03-27 20:08:10 +01:00
c416045ace fix: replace torchvision.io.write_video with PIL+ffmpeg Ethanfel 2026-03-27 20:03:39 +01:00
824550bed3 feat: verbose per-package progress during venv auto-install Ethanfel 2026-03-27 20:00:04 +01:00
8f2e204146 fix: show pip output, handle incomplete venv, fix TF version for Python 3.12 Ethanfel 2026-03-27 19:55:55 +01:00
8e3ab999f0 fix: load VAE state dict with strict=False Ethanfel 2026-03-27 19:51:51 +01:00
afc7d5b657 fix: add missing runtime dependencies to requirements.txt Ethanfel 2026-03-27 19:48:33 +01:00
e372cdc488 fix: add plugin root to sys.path so prismaudio_core is importable Ethanfel 2026-03-27 19:41:11 +01:00
7671d296fa fix: remove spurious caption_cot input entry from video_to_audio workflow Ethanfel 2026-03-27 19:39:05 +01:00
3894fcc9b4 feat: add demo workflows for text-to-audio and video-to-audio Ethanfel 2026-03-27 19:32:24 +01:00
35d0615253 feat: auto-install pip venv for feature extraction on first use Ethanfel 2026-03-27 19:27:27 +01:00
9b1cb71b2a fix: remove MMDiTWrapper import and dead code paths from factory.py Ethanfel 2026-03-27 19:12:40 +01:00
807f00417f docs: README with installation and usage instructions Ethanfel 2026-03-27 18:15:17 +01:00
618e7de64b feat: PrismAudioTextOnly node with correct T5-Gemma encoding Ethanfel 2026-03-27 18:09:11 +01:00
3d62688e8c feat: PrismAudioSampler node with correct metadata format and peak normalization Ethanfel 2026-03-27 18:07:33 +01:00
7c54ee8482 feat: PrismAudioFeatureExtractor node with subprocess bridge and conda env Ethanfel 2026-03-27 18:06:10 +01:00
3f35aa39f2 feat: PrismAudioFeatureLoader node for pre-computed .npz files Ethanfel 2026-03-27 18:04:32 +01:00
1043f4bacb feat: PrismAudioModelLoader node with auto-download and adaptive VRAM Ethanfel 2026-03-27 18:02:47 +01:00
8b634923dd fix: remove unused tqdm import from sampling.py Ethanfel 2026-03-27 18:01:29 +01:00
87bea21d49 feat: extract prismaudio_core inference with callback-enabled sampling Ethanfel 2026-03-27 17:59:37 +01:00
30e85f0f99 fix: resolve critical bugs and quality issues in prismaudio_core/models Ethanfel 2026-03-27 17:56:02 +01:00
6e1186d5bd fix: clean up dead code paths and debug artifacts in prismaudio_core/models Ethanfel 2026-03-27 17:49:57 +01:00
84c81e0e55 feat: extract prismaudio_core model modules (DiT, conditioners, VAE, diffusion) Ethanfel 2026-03-27 17:31:22 +01:00
b60ff4111b feat: extract prismaudio_core config and model factory Ethanfel 2026-03-27 17:05:57 +01:00
baa80de194 feat: project scaffolding with shared utils and node registration Ethanfel 2026-03-27 16:59:21 +01:00
c9364c4ec2 docs: initial design and implementation plan Ethanfel 2026-03-27 16:57:15 +01:00

1 2 3

Commit Graph Select branches Hide Pull Requests deprecated/lora-trainer deprecated/prismaudio experiment/crop-to-mask feature/lora-timestep-sampling feature/lora-training main Mono Color

Commit Graph

Select branches

Hide Pull Requests

deprecated/lora-trainer

deprecated/prismaudio

experiment/crop-to-mask

feature/lora-timestep-sampling

feature/lora-training

main