Commit Graph

3 Commits

Author SHA1 Message Date
Ethanfel 3d9221c248 fix: three bugs in scheduler and trainer
- trainer: raise ValueError early when remaining steps < log_interval (50)
  instead of UnboundLocalError on smoothed_img/final_path at return
- trainer: use None in grad_norm_history instead of silent 0.0 when
  grad_accum > log_interval and no optimizer step fired in the interval
- trainer: include start_step in _train_inner return dict
- scheduler: use start_step from result dict for min_loss_step and
  loss_at_steps (fixes wrong step labels on resumed experiments)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:11:25 +02:00
Ethanfel 2d200395af feat: add grad norm logging and richer experiment summary output
trainer:
- Track gradient norm before clipping at each optimizer step
- Log avg grad_norm per log_interval alongside loss in console output
- Include grad_norm_history in _train_inner return dict

scheduler:
- Add system block to summary (GPU name, VRAM, torch/CUDA version)
- Include full loss_history and grad_norm_history arrays in each
  experiment result (50-step resolution, not just save_every checkpoints)
- Add loss_std_last_quarter stability metric (std dev of raw loss over
  last 25% of steps — high value indicates unstable training)
- Add log_interval field so consumers know the x-axis resolution

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:06:39 +02:00
Ethanfel 3ec380a27e feat: add SelVA LoRA Scheduler node for automated experiment sweeps
- Extract _prepare_dataset() from SelvaLoraTrainer.train() as a module-level
  function so the dataset can be encoded once and reused across experiments
- Change _train_inner() return value from tuple to dict (adds loss_history,
  meta, completed; train() unpacks for ComfyUI — no change to node outputs)
- New SelvaLoraScheduler node: reads a JSON sweep file, runs N experiments
  sequentially, writes experiment_summary.json (updated after each run) and
  loss_comparison.png with all smoothed curves overlaid on the same axes
- Register SelvaLoraScheduler in nodes/__init__.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:03:21 +02:00