fix(ti): lower default lr/batch, add lr_batch sweep group

n4_baseline showed token_norm growing linearly without plateau — classic
sign of lr too high relative to parameter count. With only K×1024 params,
gradient signal per param is already high-magnitude; high lr causes
overshoot rather than convergence.

- Default lr: 1e-3 → 2e-4 (matches LoRA working regime)
- Default batch_size: 16 → 4 (more diverse gradients, helps norm saturate)
- ti_sweep_1.json: add lr_batch group (lr_low_b4, lr_mid_b8,
  lr_low_b4_prefix, lr_2e3), restructure with clearer groups,
  annotate n4_baseline as completed with findings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-08 23:42:22 +02:00
parent 92535deab2
commit f9d092158a
3 changed files with 42 additions and 31 deletions
+2 -2
View File
@@ -75,9 +75,9 @@ def _get_system_info() -> dict:
_PARAM_DEFAULTS = {
"n_tokens": 4,
"lr": 1e-3,
"lr": 2e-4,
"steps": 3000,
"batch_size": 16,
"batch_size": 4,
"warmup_steps": 100,
"seed": 42,
"save_every": 1000,