feat: add cosine LR decay schedule to trainer and scheduler

- Add lr_schedule param (constant|cosine) to SelvaLoraTrainer
- Cosine decays LR from initial value to ~0 after warmup, preventing
  the oscillation observed at steps 6000-8000 with lr=2e-4 flat
- Wire lr_schedule through scheduler _PARAM_DEFAULTS and _train_inner call
- Add g5_r128_lr_2e4_cosine and g5_r128_lr_3e4_cosine to r128_sweet_spot sweep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-08 13:25:01 +02:00
parent 58e1985af2
commit 1be07a80d2
3 changed files with 41 additions and 5 deletions
+15
View File
@@ -82,6 +82,21 @@
"description": "Rank 256 + LR=3e-4. Best rank + best LR candidate combined.",
"rank": 256,
"lr": 3e-4
},
{
"id": "g5_r128_lr_2e4_cosine",
"group": "cosine",
"description": "LR=2e-4 + cosine decay. Fixes the oscillation observed at step 60008000 by decaying LR to ~0 instead of staying flat.",
"lr": 2e-4,
"lr_schedule": "cosine"
},
{
"id": "g5_r128_lr_3e4_cosine",
"group": "cosine",
"description": "LR=3e-4 + cosine decay. Higher LR with decay — should reach lower loss faster then lock in.",
"lr": 3e-4,
"lr_schedule": "cosine"
}
]