ComfyUI-SelVA

Author	SHA1	Message	Date
Ethanfel	264dc49d42	feat: skip_current.flag to cancel experiment and move to next Create the flag file in the sweep output_root to skip the running experiment at the next log interval (every 50 steps): touch /path/to/experiment/skip_current.flag Scheduler marks it as 'skipped' in the summary and continues. Skipped experiments are NOT resumed on restart (unlike failed ones). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:09:01 +02:00
Ethanfel	2861327016	feat: spectral metrics per eval sample in experiment summary Computes hf_energy_ratio (>4kHz), spectral_centroid_hz, spectral_rolloff_hz at each save_every checkpoint. Logged to console and stored in experiment_summary.json under results.spectral_metrics[step]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:44:43 +02:00
Ethanfel	786a57c424	feat: sweep resume + 5 additional experiments (LR, target, extended) Scheduler: on re-run, reads existing experiment_summary.json and skips already-completed experiments — safe to stop and restart mid-sweep. tier1_thorough: adds g5 (lr 3e-5/3e-4), g6 (full target attn.qkv+linear1 at r16 and r64), and g4_full_r64_6k (6000-step extended run) — 17 total. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 00:59:16 +02:00
Ethanfel	3d9221c248	fix: three bugs in scheduler and trainer - trainer: raise ValueError early when remaining steps < log_interval (50) instead of UnboundLocalError on smoothed_img/final_path at return - trainer: use None in grad_norm_history instead of silent 0.0 when grad_accum > log_interval and no optimizer step fired in the interval - trainer: include start_step in _train_inner return dict - scheduler: use start_step from result dict for min_loss_step and loss_at_steps (fixes wrong step labels on resumed experiments) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:11:25 +02:00
Ethanfel	2d200395af	feat: add grad norm logging and richer experiment summary output trainer: - Track gradient norm before clipping at each optimizer step - Log avg grad_norm per log_interval alongside loss in console output - Include grad_norm_history in _train_inner return dict scheduler: - Add system block to summary (GPU name, VRAM, torch/CUDA version) - Include full loss_history and grad_norm_history arrays in each experiment result (50-step resolution, not just save_every checkpoints) - Add loss_std_last_quarter stability metric (std dev of raw loss over last 25% of steps — high value indicates unstable training) - Add log_interval field so consumers know the x-axis resolution Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:06:39 +02:00
Ethanfel	3ec380a27e	feat: add SelVA LoRA Scheduler node for automated experiment sweeps - Extract _prepare_dataset() from SelvaLoraTrainer.train() as a module-level function so the dataset can be encoded once and reused across experiments - Change _train_inner() return value from tuple to dict (adds loss_history, meta, completed; train() unpacks for ComfyUI — no change to node outputs) - New SelvaLoraScheduler node: reads a JSON sweep file, runs N experiments sequentially, writes experiment_summary.json (updated after each run) and loss_comparison.png with all smoothed curves overlaid on the same axes - Register SelvaLoraScheduler in nodes/__init__.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:03:21 +02:00

6 Commits