Create the flag file in the sweep output_root to skip the running
experiment at the next log interval (every 50 steps):
touch /path/to/experiment/skip_current.flag
Scheduler marks it as 'skipped' in the summary and continues.
Skipped experiments are NOT resumed on restart (unlike failed ones).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Computes hf_energy_ratio (>4kHz), spectral_centroid_hz, spectral_rolloff_hz
at each save_every checkpoint. Logged to console and stored in
experiment_summary.json under results.spectral_metrics[step].
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scheduler: on re-run, reads existing experiment_summary.json and skips
already-completed experiments — safe to stop and restart mid-sweep.
tier1_thorough: adds g5 (lr 3e-5/3e-4), g6 (full target attn.qkv+linear1
at r16 and r64), and g4_full_r64_6k (6000-step extended run) — 17 total.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- trainer: raise ValueError early when remaining steps < log_interval (50)
instead of UnboundLocalError on smoothed_img/final_path at return
- trainer: use None in grad_norm_history instead of silent 0.0 when
grad_accum > log_interval and no optimizer step fired in the interval
- trainer: include start_step in _train_inner return dict
- scheduler: use start_step from result dict for min_loss_step and
loss_at_steps (fixes wrong step labels on resumed experiments)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
trainer:
- Track gradient norm before clipping at each optimizer step
- Log avg grad_norm per log_interval alongside loss in console output
- Include grad_norm_history in _train_inner return dict
scheduler:
- Add system block to summary (GPU name, VRAM, torch/CUDA version)
- Include full loss_history and grad_norm_history arrays in each
experiment result (50-step resolution, not just save_every checkpoints)
- Add loss_std_last_quarter stability metric (std dev of raw loss over
last 25% of steps — high value indicates unstable training)
- Add log_interval field so consumers know the x-axis resolution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract _prepare_dataset() from SelvaLoraTrainer.train() as a module-level
function so the dataset can be encoded once and reused across experiments
- Change _train_inner() return value from tuple to dict (adds loss_history,
meta, completed; train() unpacks for ComfyUI — no change to node outputs)
- New SelvaLoraScheduler node: reads a JSON sweep file, runs N experiments
sequentially, writes experiment_summary.json (updated after each run) and
loss_comparison.png with all smoothed curves overlaid on the same axes
- Register SelvaLoraScheduler in nodes/__init__.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>