ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	16b3eb11cc	fix: pass max_size=800 to progress bar preview (was 85px wide) The third element in ComfyUI's preview tuple is max_size in pixels, not JPEG quality. Passing 85 was capping the live loss curve at 85×40px. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:48:56 +02:00
Ethanfel	004ea63f62	fix: fall back to soundfile for torchaudio.save when torchcodec unavailable Same torchcodec/FFmpeg issue as the load path, now on the eval sample save. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:44:04 +02:00
Ethanfel	afb3242eca	fix: disable inference_mode entirely for training via inference_mode(False) torch.enable_grad() alone is insufficient: operations on inference tensors (created inside ComfyUI's outer inference_mode context) produce inference tensors even inside enable_grad, breaking autograd. inference_mode(False) exits the inference context so the deepcopy, apply_lora, and training loop run with a fully clean autograd context. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:40:50 +02:00
Ethanfel	849f31e2a6	fix: create LoRA params inside torch.enable_grad() to escape inference_mode torch.enable_grad() re-enables grad tracking but nn.Parameters created while torch.inference_mode() is active are inference tensors that can't enter autograd regardless. Splitting into _train_inner() and calling it inside enable_grad() ensures the deepcopy, apply_lora, and the training loop all run with a clean autograd context. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:36:28 +02:00
Ethanfel	505d445eb3	fix: wrap training loop in torch.enable_grad() ComfyUI executes all nodes inside torch.no_grad(), which prevents gradient tracking and makes loss.backward() fail. torch.enable_grad() overrides it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:32:00 +02:00
Ethanfel	ad57432803	fix: pad/trim latent to exact latent_seq_len after VAE encoding STFT hop-size rounding produces ±1 latent frame vs the expected seq length. Clamp to seq_cfg.latent_seq_len after transpose so generator.forward assertion passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:12:20 +02:00
Ethanfel	43f732f904	fix: transpose VAE latent from [B,C,T] to [B,T,C] before generator VAE encoder returns channels-first [B, latent_dim, T]; the generator expects time-first [B, T, latent_dim] (same convention as decode which already does .transpose(1,2)). Fixes normalize() size mismatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:08:00 +02:00
Ethanfel	6b9adf0816	fix: fall back to soundfile when torchcodec FFmpeg libs are missing Recent torchaudio defaults to torchcodec as the audio backend, which requires FFmpeg shared libraries. Falls back to soundfile for envs where torchcodec can't load (e.g. containerised ComfyUI without system FFmpeg). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:03:57 +02:00
Ethanfel	52434a053a	fix: keep VAE in float32 for mel/stft; print full traceback on clip load failure torch.stft requires float32 input — casting vae_utils to bf16 caused silent failures during dataset pre-loading. Also adds traceback.print_exc() so future clip-load errors are visible in the ComfyUI log. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 21:57:20 +02:00
Ethanfel	56c8d5d6b4	feat: save eval audio sample alongside each checkpoint At every save_every steps, run a quick 8-step no-CFG inference pass on a random training clip and save the decoded waveform as sample_stepXXXXX.wav next to the checkpoint. Uses the existing generator.unnormalize + feature_utils.decode + vocode pipeline from the sampler. Failure is non-fatal (logged and skipped). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 21:47:02 +02:00
Ethanfel	b430953602	feat: live loss curve preview during training - Send updated loss curve to ComfyUI frontend every 50 steps via pbar_train.update_absolute() with a JPEG preview tuple — same mechanism as KSampler's denoising previews. - Fix x-axis step labels for resumed runs (previously always started at 0; now correctly shows start_step + offset). - Split _draw_loss_curve (returns PIL Image) from _pil_to_tensor (converts for ComfyUI IMAGE output). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:11:38 +02:00
Ethanfel	57cd3dd4b4	fix: use load_lora for resume and remove redundant inference_mode wrapper - Resume now calls load_lora() instead of load_state_dict() directly, giving proper warnings for missing/unexpected LoRA keys. - Remove redundant `with torch.inference_mode():` around encode_audio (already @inference_mode decorated); dist.mode().clone() pattern is now clearer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:09:35 +02:00
Ethanfel	f206a1b38c	feat: add SelVA LoRA Trainer ComfyUI node Runs the full training loop inside ComfyUI. Reuses the already-loaded CLIP model from the inference model for text encoding; loads only a minimal VAE encoder separately (freed after dataset pre-loading). Outputs: - SELVA_MODEL with LoRA applied (ready to connect directly to Sampler) - adapter_path STRING (for SelVA LoRA Loader in future sessions) - loss_curve IMAGE (PIL-rendered line chart of training loss per 50 steps) Progress is shown via ComfyUI ProgressBar (two phases: dataset loading, then training steps). Resume is supported via resume_path input. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:07:38 +02:00

13 Commits