ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	e1a2f0ed7d	feat: add inject_mode (suffix/prefix) to TI pipeline Observation: n4_baseline loss barely moved (1.025→0.965 over 3000 steps), token_norm grew linearly without plateau — generator likely ignores last-K CLIP positions (EOS/padding zone) where suffix injects. Fix: add inject_mode parameter throughout the pipeline: - "suffix": replace last K positions (original behavior, model may ignore) - "prefix": replace positions 1:1+K right after BOS — highest attention weight in CLIP, much stronger gradient signal expected Changes: - selva_textual_inversion_trainer.py: _inject_tokens() helper centralises the torch.cat construction for both modes; used in training loop and eval; inject_mode stored in checkpoint files - selva_textual_inversion_loader.py: reads inject_mode from checkpoint, includes in TEXTUAL_INVERSION bundle - selva_sampler.py: uses _inject_tokens() via bundle's inject_mode field - selva_ti_scheduler.py: inject_mode in _PARAM_DEFAULTS, config, and _train_inner call - ti_sweep_1.json: updated with prefix_inject group (n4, n8, n4+warm); n4_baseline marked completed; suffix experiments retained for comparison Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 23:31:52 +02:00
Ethanfel	e56ece9c1c	feat: add SelVA Textual Inversion Trainer and Loader nodes Learns K CLIP token embeddings ([K, 1024]) with all model weights frozen, keeping generated latents on the decoder's natural manifold — avoids the quality degradation that affects LoRA on BJ's audio dataset. - selva_textual_inversion_trainer.py: trains learned_tokens via AdamW, injects into last K positions of 77-token CLIP embedding, checkpoints with eval audio + spectral metrics - selva_textual_inversion_loader.py: loads .pt bundle, returns TEXTUAL_INVERSION dict for sampler - selva_sampler.py: optional textual_inversion input; injects into both text_clip and neg_text_clip before preprocess_conditions - __init__.py: registers both new nodes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 23:01:44 +02:00
Ethanfel	9a47508d2d	fix: lower RMS normalization target from -23/-20 to -27 dBFS Training clips at -23 LUFS measure -25 to -31 dBFS RMS (avg ~-27). Normalizing output to -23 dBFS was 4-8 dB too loud, causing saturation on clips with high crest factor and peaks near 0 dBFS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 17:19:20 +02:00
Ethanfel	8717af2728	fix: prevent saturation from RMS normalization clipping peaks RMS normalize to target then scale back if peaks exceed 1.0, preserving dynamics instead of hard-clipping transients. Eval sample target updated to -23 dBFS to match training data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:29:29 +02:00
Ethanfel	78e9838a83	fix: replace peak normalization with RMS normalization at -20 dBFS Peak norm was slamming output to full scale regardless of content level, making generated audio several times louder than training clips. RMS norm to -20 dBFS matches typical processed audio level. Sampler exposes target_lufs (-40 to -6, default -20) for user control. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:06:48 +02:00
Ethanfel	3dd6badfd9	fix: guarantee offload cleanup on exception with try/finally Both nodes moved models to GPU before work then back to CPU after. Any exception (OOM, cancellation, bad input) would skip the cleanup, leaving models on GPU permanently until ComfyUI restarts. Wrap the entire work block in try/finally so offload_to_cpu cleanup always runs regardless of how the node exits. Also removes the unused `mode` variable in SelvaSampler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 08:40:39 +02:00
Ethanfel	8bb2fb7015	fix: extend OOM catch to decode/vocode, add (masked) to sync log line - selva_sampler: wrap decode+vocode in their own OOM catch — previously OOM during mel decode or vocoding gave a raw CUDA traceback instead of the actionable hint - selva_feature_extractor: sync frames log line now shows (masked) when a mask is active, matching the CLIP log line Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 08:38:59 +02:00
Ethanfel	bd53744e2d	feat: comprehensive node improvements Model Loader: - bf16 support check — auto-falls back to fp16 on unsupported GPUs - DESCRIPTION and OUTPUT_TOOLTIPS Feature Extractor: - Store variant in features dict and .npz cache - Progress bar (3 steps: CLIP encode, T5 encode, sync encode) - Expand cache hash to 32 hex chars - DESCRIPTION and OUTPUT_TOOLTIPS Sampler: - Variant mismatch validation against extracted features - Cancellation support via throw_exception_if_processing_interrupted() - OOM catch with actionable error message - normalize toggle (optional BOOLEAN, default true) for peak normalization - Remove empty optional: {} block - DESCRIPTION and OUTPUT_TOOLTIPS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 18:16:03 +02:00
Ethanfel	429810db5b	docs: improve tooltips on all three SelVA nodes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 18:10:05 +02:00
Ethanfel	ff26d0b87d	fix: bug sweep and improvements - nodes/__init__.py: fix [PrismAudio] leftover label in error print - selva_feature_extractor: hash beginning, middle and end of video tensor instead of just first 1MB, avoiding collisions on videos with same opening frames - selva_sampler: derive SequenceConfig from model template via dataclasses.replace instead of hardcoding sampling_rate/spectrogram_frame_rate per mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 18:04:35 +02:00
Ethanfel	83b1da9520	chore: remove all PrismAudio code from main branch - Delete prismaudio_core/, data_utils/, scripts/, docs/plans/ - Delete PrismAudio nodes (feature_extractor, feature_loader, model_loader, sampler, text_only) - Delete PrismAudio workflows (video_to_audio, text_to_audio) - Clean nodes/utils.py: rename PRISMAUDIO_CATEGORY → SELVA_CATEGORY, remove unused helpers - Strip PrismAudio-only deps from requirements.txt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 17:58:31 +02:00
Ethanfel	40388ba6de	fix: negative_prompt inline (multiline:false) + VAE filename v1-44.pth not v1-44k.pth - SelvaSampler: multiline:false puts negative_prompt inline above sliders - SelvaModelLoader: VAE filenames in download_utils are v1-16.pth/v1-44.pth, not v1-{mode}.pth (mode includes the 'k' suffix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:35:17 +02:00
Ethanfel	789e09535d	fix: SelvaSampler — negative_prompt above settings Move negative_prompt to required inputs, right after prompt, so it appears above duration/steps/cfg/seed in the ComfyUI node layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:31:53 +02:00
Ethanfel	e3a3384727	fix: SelvaSampler input order — prompt required, negative_prompt optional ComfyUI renders required inputs above optional ones. Moving negative_prompt to optional puts prompt first (natural order) and negative_prompt at the bottom where it belongs as a power-user input. Also guards against negative_prompt=None when not connected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:27:07 +02:00
Ethanfel	27b4424e1a	feat: prompt entered once in SelvaFeatureExtractor, reused by SelvaSampler SelvaFeatureExtractor now stores the prompt in SELVA_FEATURES (both in the returned dict and the .npz cache). SelvaSampler's prompt is now optional — when left empty it falls back to the prompt stored in features. A non-empty override can still be passed when CLIP text should differ from the sync text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 16:22:59 +02:00
Ethanfel	6474e2816c	fix: two bugs in SelVA nodes - selva_feature_extractor: cache hash now includes resolved duration; same video + different duration override no longer returns stale features - selva_sampler: MPS-safe noise generation (torch.Generator on CPU then move to device, same pattern as PrismAudioSampler) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:39:57 +02:00
Ethanfel	b59b657b6f	feat: SelvaSampler — flow matching ODE with CFG and negative prompts Calls update_seq_lengths with actual feature dimensions (not seq_cfg) to avoid rounding assertion mismatches. Progress bar tracks each Euler step. Supports negative prompts for steering, normalizes output to [-1,1]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 15:31:18 +02:00

17 Commits