f9d092158a
n4_baseline showed token_norm growing linearly without plateau — classic sign of lr too high relative to parameter count. With only K×1024 params, gradient signal per param is already high-magnitude; high lr causes overshoot rather than convergence. - Default lr: 1e-3 → 2e-4 (matches LoRA working regime) - Default batch_size: 16 → 4 (more diverse gradients, helps norm saturate) - ti_sweep_1.json: add lr_batch group (lr_low_b4, lr_mid_b8, lr_low_b4_prefix, lr_2e3), restructure with clearer groups, annotate n4_baseline as completed with findings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>