Observation: n4_baseline loss barely moved (1.025→0.965 over 3000 steps),
token_norm grew linearly without plateau — generator likely ignores last-K
CLIP positions (EOS/padding zone) where suffix injects.
Fix: add inject_mode parameter throughout the pipeline:
- "suffix": replace last K positions (original behavior, model may ignore)
- "prefix": replace positions 1:1+K right after BOS — highest attention
weight in CLIP, much stronger gradient signal expected
Changes:
- selva_textual_inversion_trainer.py: _inject_tokens() helper centralises
the torch.cat construction for both modes; used in training loop and eval;
inject_mode stored in checkpoint files
- selva_textual_inversion_loader.py: reads inject_mode from checkpoint,
includes in TEXTUAL_INVERSION bundle
- selva_sampler.py: uses _inject_tokens() via bundle's inject_mode field
- selva_ti_scheduler.py: inject_mode in _PARAM_DEFAULTS, config, and
_train_inner call
- ti_sweep_1.json: updated with prefix_inject group (n4, n8, n4+warm);
n4_baseline marked completed; suffix experiments retained for comparison
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First TI sweep covering the three most impactful axes:
- token_count group: n_tokens 4 / 8 / 16 (capacity vs overfitting)
- learning_rate group: 5e-4 / 1e-3 / 2e-3 with n_tokens=4
- warm_init group: n4 and n8 seeded from 'mechanical impact sound design'
7 experiments total, 3000 steps each, same data_dir as LoRA sweeps.
n4_baseline (lr=1e-3, random init) is the primary reference point.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>