ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	52434a053a	fix: keep VAE in float32 for mel/stft; print full traceback on clip load failure torch.stft requires float32 input — casting vae_utils to bf16 caused silent failures during dataset pre-loading. Also adds traceback.print_exc() so future clip-load errors are visible in the ComfyUI log. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 21:57:20 +02:00
Ethanfel	56c8d5d6b4	feat: save eval audio sample alongside each checkpoint At every save_every steps, run a quick 8-step no-CFG inference pass on a random training clip and save the decoded waveform as sample_stepXXXXX.wav next to the checkpoint. Uses the existing generator.unnormalize + feature_utils.decode + vocode pipeline from the sampler. Failure is non-fatal (logged and skipped). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 21:47:02 +02:00
Ethanfel	b430953602	feat: live loss curve preview during training - Send updated loss curve to ComfyUI frontend every 50 steps via pbar_train.update_absolute() with a JPEG preview tuple — same mechanism as KSampler's denoising previews. - Fix x-axis step labels for resumed runs (previously always started at 0; now correctly shows start_step + offset). - Split _draw_loss_curve (returns PIL Image) from _pil_to_tensor (converts for ComfyUI IMAGE output). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:11:38 +02:00
Ethanfel	57cd3dd4b4	fix: use load_lora for resume and remove redundant inference_mode wrapper - Resume now calls load_lora() instead of load_state_dict() directly, giving proper warnings for missing/unexpected LoRA keys. - Remove redundant `with torch.inference_mode():` around encode_audio (already @inference_mode decorated); dist.mode().clone() pattern is now clearer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:09:35 +02:00
Ethanfel	f206a1b38c	feat: add SelVA LoRA Trainer ComfyUI node Runs the full training loop inside ComfyUI. Reuses the already-loaded CLIP model from the inference model for text encoding; loads only a minimal VAE encoder separately (freed after dataset pre-loading). Outputs: - SELVA_MODEL with LoRA applied (ready to connect directly to Sampler) - adapter_path STRING (for SelVA LoRA Loader in future sessions) - loss_curve IMAGE (PIL-rendered line chart of training loss per 50 steps) Progress is shown via ComfyUI ProgressBar (two phases: dataset loading, then training steps). Resume is supported via resume_path input. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:07:38 +02:00

5 Commits