Learns K CLIP token embeddings ([K, 1024]) with all model weights frozen,
keeping generated latents on the decoder's natural manifold — avoids the
quality degradation that affects LoRA on BJ's audio dataset.
- selva_textual_inversion_trainer.py: trains learned_tokens via AdamW,
injects into last K positions of 77-token CLIP embedding, checkpoints
with eval audio + spectral metrics
- selva_textual_inversion_loader.py: loads .pt bundle, returns
TEXTUAL_INVERSION dict for sampler
- selva_sampler.py: optional textual_inversion input; injects into both
text_clip and neg_text_clip before preprocess_conditions
- __init__.py: registers both new nodes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Training clips at -23 LUFS measure -25 to -31 dBFS RMS (avg ~-27).
Normalizing output to -23 dBFS was 4-8 dB too loud, causing saturation
on clips with high crest factor and peaks near 0 dBFS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RMS normalize to target then scale back if peaks exceed 1.0,
preserving dynamics instead of hard-clipping transients.
Eval sample target updated to -23 dBFS to match training data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peak norm was slamming output to full scale regardless of content level,
making generated audio several times louder than training clips.
RMS norm to -20 dBFS matches typical processed audio level.
Sampler exposes target_lufs (-40 to -6, default -20) for user control.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both nodes moved models to GPU before work then back to CPU after.
Any exception (OOM, cancellation, bad input) would skip the cleanup,
leaving models on GPU permanently until ComfyUI restarts.
Wrap the entire work block in try/finally so offload_to_cpu cleanup
always runs regardless of how the node exits. Also removes the unused
`mode` variable in SelvaSampler.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- selva_sampler: wrap decode+vocode in their own OOM catch — previously
OOM during mel decode or vocoding gave a raw CUDA traceback instead
of the actionable hint
- selva_feature_extractor: sync frames log line now shows (masked) when
a mask is active, matching the CLIP log line
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Model Loader:
- bf16 support check — auto-falls back to fp16 on unsupported GPUs
- DESCRIPTION and OUTPUT_TOOLTIPS
Feature Extractor:
- Store variant in features dict and .npz cache
- Progress bar (3 steps: CLIP encode, T5 encode, sync encode)
- Expand cache hash to 32 hex chars
- DESCRIPTION and OUTPUT_TOOLTIPS
Sampler:
- Variant mismatch validation against extracted features
- Cancellation support via throw_exception_if_processing_interrupted()
- OOM catch with actionable error message
- normalize toggle (optional BOOLEAN, default true) for peak normalization
- Remove empty optional: {} block
- DESCRIPTION and OUTPUT_TOOLTIPS
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- nodes/__init__.py: fix [PrismAudio] leftover label in error print
- selva_feature_extractor: hash beginning, middle and end of video tensor
instead of just first 1MB, avoiding collisions on videos with same opening frames
- selva_sampler: derive SequenceConfig from model template via dataclasses.replace
instead of hardcoding sampling_rate/spectrogram_frame_rate per mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- SelvaSampler: multiline:false puts negative_prompt inline above sliders
- SelvaModelLoader: VAE filenames in download_utils are v1-16.pth/v1-44.pth,
not v1-{mode}.pth (mode includes the 'k' suffix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move negative_prompt to required inputs, right after prompt, so it appears
above duration/steps/cfg/seed in the ComfyUI node layout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ComfyUI renders required inputs above optional ones. Moving negative_prompt
to optional puts prompt first (natural order) and negative_prompt at the
bottom where it belongs as a power-user input. Also guards against
negative_prompt=None when not connected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SelvaFeatureExtractor now stores the prompt in SELVA_FEATURES (both in the
returned dict and the .npz cache). SelvaSampler's prompt is now optional —
when left empty it falls back to the prompt stored in features. A non-empty
override can still be passed when CLIP text should differ from the sync text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- selva_feature_extractor: cache hash now includes resolved duration;
same video + different duration override no longer returns stale features
- selva_sampler: MPS-safe noise generation (torch.Generator on CPU then
move to device, same pattern as PrismAudioSampler)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Calls update_seq_lengths with actual feature dimensions (not seq_cfg) to
avoid rounding assertion mismatches. Progress bar tracks each Euler step.
Supports negative prompts for steering, normalizes output to [-1,1].
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>