chore: lower default warmup_steps from 500 to 100

500 warmup steps is 25% of a 2000-step run — too long. 100 steps lets
the full lr kick in much earlier without sacrificing stability.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-05 22:51:27 +02:00
parent 16b3eb11cc
commit 4806daa4ca
3 changed files with 4 additions and 4 deletions
+1 -1
View File
@@ -159,7 +159,7 @@ def main():
help="Module name suffixes to wrap with LoRA. Also try 'linear1'.")
parser.add_argument("--lr", type=float, default=1e-4)
parser.add_argument("--steps", type=int, default=2000)
parser.add_argument("--warmup_steps",type=int, default=500)
parser.add_argument("--warmup_steps",type=int, default=100)
parser.add_argument("--grad_accum", type=int, default=4, help="Gradient accumulation steps")
parser.add_argument("--save_every", type=int, default=500)
parser.add_argument("--resume", default=None,