chore: lower default warmup_steps from 500 to 100
500 warmup steps is 25% of a 2000-step run — too long. 100 steps lets the full lr kick in much earlier without sacrificing stability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+1
-1
@@ -159,7 +159,7 @@ def main():
|
||||
help="Module name suffixes to wrap with LoRA. Also try 'linear1'.")
|
||||
parser.add_argument("--lr", type=float, default=1e-4)
|
||||
parser.add_argument("--steps", type=int, default=2000)
|
||||
parser.add_argument("--warmup_steps",type=int, default=500)
|
||||
parser.add_argument("--warmup_steps",type=int, default=100)
|
||||
parser.add_argument("--grad_accum", type=int, default=4, help="Gradient accumulation steps")
|
||||
parser.add_argument("--save_every", type=int, default=500)
|
||||
parser.add_argument("--resume", default=None,
|
||||
|
||||
Reference in New Issue
Block a user