feat: add cosine LR decay schedule to trainer and scheduler
- Add lr_schedule param (constant|cosine) to SelvaLoraTrainer - Cosine decays LR from initial value to ~0 after warmup, preventing the oscillation observed at steps 6000-8000 with lr=2e-4 flat - Wire lr_schedule through scheduler _PARAM_DEFAULTS and _train_inner call - Add g5_r128_lr_2e4_cosine and g5_r128_lr_3e4_cosine to r128_sweet_spot sweep Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -82,6 +82,21 @@
|
||||
"description": "Rank 256 + LR=3e-4. Best rank + best LR candidate combined.",
|
||||
"rank": 256,
|
||||
"lr": 3e-4
|
||||
},
|
||||
|
||||
{
|
||||
"id": "g5_r128_lr_2e4_cosine",
|
||||
"group": "cosine",
|
||||
"description": "LR=2e-4 + cosine decay. Fixes the oscillation observed at step 6000–8000 by decaying LR to ~0 instead of staying flat.",
|
||||
"lr": 2e-4,
|
||||
"lr_schedule": "cosine"
|
||||
},
|
||||
{
|
||||
"id": "g5_r128_lr_3e4_cosine",
|
||||
"group": "cosine",
|
||||
"description": "LR=3e-4 + cosine decay. Higher LR with decay — should reach lower loss faster then lock in.",
|
||||
"lr": 3e-4,
|
||||
"lr_schedule": "cosine"
|
||||
}
|
||||
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user