fix(ti): lower default lr/batch, add lr_batch sweep group
n4_baseline showed token_norm growing linearly without plateau — classic sign of lr too high relative to parameter count. With only K×1024 params, gradient signal per param is already high-magnitude; high lr causes overshoot rather than convergence. - Default lr: 1e-3 → 2e-4 (matches LoRA working regime) - Default batch_size: 16 → 4 (more diverse gradients, helps norm saturate) - ti_sweep_1.json: add lr_batch group (lr_low_b4, lr_mid_b8, lr_low_b4_prefix, lr_2e3), restructure with clearer groups, annotate n4_baseline as completed with findings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -75,9 +75,9 @@ def _get_system_info() -> dict:
|
||||
|
||||
_PARAM_DEFAULTS = {
|
||||
"n_tokens": 4,
|
||||
"lr": 1e-3,
|
||||
"lr": 2e-4,
|
||||
"steps": 3000,
|
||||
"batch_size": 16,
|
||||
"batch_size": 4,
|
||||
"warmup_steps": 100,
|
||||
"seed": 42,
|
||||
"save_every": 1000,
|
||||
|
||||
Reference in New Issue
Block a user