Commit Graph

15 Commits

Author SHA1 Message Date
Ethanfel 3f67de694c feat: save loss_raw.png and loss_smoothed.png to output_dir
Raw curve shown in light blue, EMA-smoothed (beta=0.9) overlay in darker
blue. Both saved as PNG at end of training. The node IMAGE output now
returns the smoothed version. Live preview also uses the smoothed overlay.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 23:15:48 +02:00
Ethanfel 4806daa4ca chore: lower default warmup_steps from 500 to 100
500 warmup steps is 25% of a 2000-step run — too long. 100 steps lets
the full lr kick in much earlier without sacrificing stability.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:51:27 +02:00
Ethanfel 16b3eb11cc fix: pass max_size=800 to progress bar preview (was 85px wide)
The third element in ComfyUI's preview tuple is max_size in pixels, not
JPEG quality. Passing 85 was capping the live loss curve at 85×40px.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:48:56 +02:00
Ethanfel 004ea63f62 fix: fall back to soundfile for torchaudio.save when torchcodec unavailable
Same torchcodec/FFmpeg issue as the load path, now on the eval sample save.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:44:04 +02:00
Ethanfel afb3242eca fix: disable inference_mode entirely for training via inference_mode(False)
torch.enable_grad() alone is insufficient: operations on inference tensors
(created inside ComfyUI's outer inference_mode context) produce inference
tensors even inside enable_grad, breaking autograd. inference_mode(False)
exits the inference context so the deepcopy, apply_lora, and training loop
run with a fully clean autograd context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:40:50 +02:00
Ethanfel 849f31e2a6 fix: create LoRA params inside torch.enable_grad() to escape inference_mode
torch.enable_grad() re-enables grad tracking but nn.Parameters created while
torch.inference_mode() is active are inference tensors that can't enter autograd
regardless. Splitting into _train_inner() and calling it inside enable_grad()
ensures the deepcopy, apply_lora, and the training loop all run with a clean
autograd context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:36:28 +02:00
Ethanfel 505d445eb3 fix: wrap training loop in torch.enable_grad()
ComfyUI executes all nodes inside torch.no_grad(), which prevents gradient
tracking and makes loss.backward() fail. torch.enable_grad() overrides it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:32:00 +02:00
Ethanfel ad57432803 fix: pad/trim latent to exact latent_seq_len after VAE encoding
STFT hop-size rounding produces ±1 latent frame vs the expected seq length.
Clamp to seq_cfg.latent_seq_len after transpose so generator.forward assertion passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:12:20 +02:00
Ethanfel 43f732f904 fix: transpose VAE latent from [B,C,T] to [B,T,C] before generator
VAE encoder returns channels-first [B, latent_dim, T]; the generator
expects time-first [B, T, latent_dim] (same convention as decode which
already does .transpose(1,2)). Fixes normalize() size mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:08:00 +02:00
Ethanfel 6b9adf0816 fix: fall back to soundfile when torchcodec FFmpeg libs are missing
Recent torchaudio defaults to torchcodec as the audio backend, which requires
FFmpeg shared libraries. Falls back to soundfile for envs where torchcodec
can't load (e.g. containerised ComfyUI without system FFmpeg).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:03:57 +02:00
Ethanfel 52434a053a fix: keep VAE in float32 for mel/stft; print full traceback on clip load failure
torch.stft requires float32 input — casting vae_utils to bf16 caused silent
failures during dataset pre-loading. Also adds traceback.print_exc() so future
clip-load errors are visible in the ComfyUI log.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 21:57:20 +02:00
Ethanfel 56c8d5d6b4 feat: save eval audio sample alongside each checkpoint
At every save_every steps, run a quick 8-step no-CFG inference pass on
a random training clip and save the decoded waveform as
sample_stepXXXXX.wav next to the checkpoint. Uses the existing
generator.unnormalize + feature_utils.decode + vocode pipeline from
the sampler. Failure is non-fatal (logged and skipped).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 21:47:02 +02:00
Ethanfel b430953602 feat: live loss curve preview during training
- Send updated loss curve to ComfyUI frontend every 50 steps via
  pbar_train.update_absolute() with a JPEG preview tuple — same
  mechanism as KSampler's denoising previews.
- Fix x-axis step labels for resumed runs (previously always started
  at 0; now correctly shows start_step + offset).
- Split _draw_loss_curve (returns PIL Image) from _pil_to_tensor
  (converts for ComfyUI IMAGE output).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:11:38 +02:00
Ethanfel 57cd3dd4b4 fix: use load_lora for resume and remove redundant inference_mode wrapper
- Resume now calls load_lora() instead of load_state_dict() directly,
  giving proper warnings for missing/unexpected LoRA keys.
- Remove redundant `with torch.inference_mode():` around encode_audio
  (already @inference_mode decorated); dist.mode().clone() pattern
  is now clearer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:09:35 +02:00
Ethanfel f206a1b38c feat: add SelVA LoRA Trainer ComfyUI node
Runs the full training loop inside ComfyUI. Reuses the already-loaded
CLIP model from the inference model for text encoding; loads only a
minimal VAE encoder separately (freed after dataset pre-loading).

Outputs:
- SELVA_MODEL with LoRA applied (ready to connect directly to Sampler)
- adapter_path STRING (for SelVA LoRA Loader in future sessions)
- loss_curve IMAGE (PIL-rendered line chart of training loss per 50 steps)

Progress is shown via ComfyUI ProgressBar (two phases: dataset loading,
then training steps). Resume is supported via resume_path input.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:07:38 +02:00