Replaces single-sample steps with batched sampling via random.choices().
Tensors are stacked to [B, T, C] before the forward pass; t is now [B].
Default grad_accum lowered to 1 since real batching gives stable gradients.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
500 warmup steps is 25% of a 2000-step run — too long. 100 steps lets
the full lr kick in much earlier without sacrificing stability.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Step checkpoints now save optimizer state, scheduler state, and step
number alongside the LoRA weights. Pass --resume path/to/adapter_stepXXXXX.pt
to continue training from that checkpoint. --steps always means total steps,
so resuming from 1000 with --steps 2000 trains 1000 more steps.
adapter_final.pt format is unchanged (state_dict + meta only) so
SelvaLoraLoader remains compatible.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>