Commit Graph

15 Commits

Author SHA1 Message Date
Ethanfel e16480b4c9 feat: add PiSSA/rsLoRA support to scheduler and PiSSA sweep experiment
Thread init_mode and use_rslora through the scheduler's config parsing,
experiment record, and _train_inner call. Default alpha changed to 2*rank
to match trainer. Add pissa_sweep.json with 7 experiments ablating PiSSA
init vs standard, rsLoRA scaling, and learning rate variations at rank 128.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 22:07:27 +02:00
Ethanfel becb38c27e fix: use soundfile for WAV/FLAC/OGG to bypass torchcodec/FFmpeg dependency
torchaudio was defaulting to the torchcodec backend which requires FFmpeg
shared libraries not present in the ComfyUI venv, silently skipping every
clip and producing an empty dataset.

Also add experiments/vocoder_finetune.json for the BJ vocoder LoRA run
(lr=3e-4, rank=128, 10k steps).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 15:16:22 +02:00
Ethanfel b89167cfae fix(ti-trainer): clamp token norm to CLIP manifold to prevent buzz artifacts
Diagnosis: learned tokens grew to norm ~3.2 while real CLIP content tokens
sit at ~1.0. Model never trained on embeddings that large — activates buzz
artifact instead of semantic style shift.

Fix: measure mean token norm from content positions (1–20) of dataset CLIP
embeddings at startup, clamp learned_tokens per-token after every optimizer
step to max 1.5× that reference (50% headroom). Token norm is now logged
as current/limit for easy monitoring.

ti_sweep_1.json: rebuild around norm_clamp group — n4_clamped (primary
diagnostic), prefix_clamped, n8_prefix_clamped, warm_clamped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 23:54:23 +02:00
Ethanfel f9d092158a fix(ti): lower default lr/batch, add lr_batch sweep group
n4_baseline showed token_norm growing linearly without plateau — classic
sign of lr too high relative to parameter count. With only K×1024 params,
gradient signal per param is already high-magnitude; high lr causes
overshoot rather than convergence.

- Default lr: 1e-3 → 2e-4 (matches LoRA working regime)
- Default batch_size: 16 → 4 (more diverse gradients, helps norm saturate)
- ti_sweep_1.json: add lr_batch group (lr_low_b4, lr_mid_b8,
  lr_low_b4_prefix, lr_2e3), restructure with clearer groups,
  annotate n4_baseline as completed with findings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 23:42:22 +02:00
Ethanfel e1a2f0ed7d feat: add inject_mode (suffix/prefix) to TI pipeline
Observation: n4_baseline loss barely moved (1.025→0.965 over 3000 steps),
token_norm grew linearly without plateau — generator likely ignores last-K
CLIP positions (EOS/padding zone) where suffix injects.

Fix: add inject_mode parameter throughout the pipeline:
- "suffix": replace last K positions (original behavior, model may ignore)
- "prefix": replace positions 1:1+K right after BOS — highest attention
  weight in CLIP, much stronger gradient signal expected

Changes:
- selva_textual_inversion_trainer.py: _inject_tokens() helper centralises
  the torch.cat construction for both modes; used in training loop and eval;
  inject_mode stored in checkpoint files
- selva_textual_inversion_loader.py: reads inject_mode from checkpoint,
  includes in TEXTUAL_INVERSION bundle
- selva_sampler.py: uses _inject_tokens() via bundle's inject_mode field
- selva_ti_scheduler.py: inject_mode in _PARAM_DEFAULTS, config, and
  _train_inner call
- ti_sweep_1.json: updated with prefix_inject group (n4, n8, n4+warm);
  n4_baseline marked completed; suffix experiments retained for comparison

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 23:31:52 +02:00
Ethanfel c0d95ce356 feat: add ti_sweep_1 experiment file
First TI sweep covering the three most impactful axes:
- token_count group: n_tokens 4 / 8 / 16 (capacity vs overfitting)
- learning_rate group: 5e-4 / 1e-3 / 2e-3 with n_tokens=4
- warm_init group: n4 and n8 seeded from 'mechanical impact sound design'

7 experiments total, 3000 steps each, same data_dir as LoRA sweeps.
n4_baseline (lr=1e-3, random init) is the primary reference point.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 23:14:31 +02:00
Ethanfel c8e6b91f67 feat: add alpha_scale_sweep to fix LoRA noise contamination
Previous sweep used alpha=rank (scale=1.0) which at rank 128/256 drowned
base model priors — spectral flatness went from 0.013 (baseline) to 0.094.
This sweep tests alpha dramatically below rank across r16/r32/r128 to find
the scale where LoRA nudges rather than overwrites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 17:55:05 +02:00
Ethanfel dbfa7b23fe feat: add eval_r128_candidates.json
Evaluates top 5 adapters from r128_sweet_spot: baseline, lr_5e4_r128,
lr_3e4_r256, lr_3e4_r128, curriculum_lr_3e4 final + step 6000 checkpoint
(before regression) for spectral comparison.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 17:28:28 +02:00
Ethanfel 1be07a80d2 feat: add cosine LR decay schedule to trainer and scheduler
- Add lr_schedule param (constant|cosine) to SelvaLoraTrainer
- Cosine decays LR from initial value to ~0 after warmup, preventing
  the oscillation observed at steps 6000-8000 with lr=2e-4 flat
- Wire lr_schedule through scheduler _PARAM_DEFAULTS and _train_inner call
- Add g5_r128_lr_2e4_cosine and g5_r128_lr_3e4_cosine to r128_sweet_spot sweep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:25:01 +02:00
Ethanfel 94610b8943 feat: r128_sweet_spot sweep — noise-free LR search + rank 256
9 experiments targeting loss 0.25-0.35 without LoRA+ noise.
Tests higher base LR (2e-4/3e-4/5e-4), curriculum combos, conservative
LoRA+ ratio=4, and rank 256 baseline + lr=3e-4.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 10:46:08 +02:00
Ethanfel a7923d5fb7 feat: r64_overnight sweep — focused rank-64 ablation at 8000 steps
15 experiments across rank (64/128), alpha, regularisation, LR, target
layers, and combined stacks. Based on tier1_thorough early results
confirming rank 64 sounds best perceptually.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 01:32:23 +02:00
Ethanfel 786a57c424 feat: sweep resume + 5 additional experiments (LR, target, extended)
Scheduler: on re-run, reads existing experiment_summary.json and skips
already-completed experiments — safe to stop and restart mid-sweep.

tier1_thorough: adds g5 (lr 3e-5/3e-4), g6 (full target attn.qkv+linear1
at r16 and r64), and g4_full_r64_6k (6000-step extended run) — 17 total.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 00:59:16 +02:00
Ethanfel 0682a536cb fix: point data_dir to features/ subdir where .npz and audio live
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 00:45:32 +02:00
Ethanfel 0000878e76 feat: thorough overnight sweep + dataset browser updates
- Dataset browser: audio/features now resolve through features/ subdir
- tier1_sweep.json: update data_dir to BJ dataset path
- tier1_thorough.json: 12-experiment overnight sweep across 4 groups
  (rank 16/32/64, alpha scaling, LoRA+/dropout/curriculum isolation,
  full Tier 1 stack at r16 and r64) — output to BJ/experiment/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 00:38:19 +02:00
Ethanfel f1e2bbd55b feat: add first experiment sweep file for Tier 1 ablation
6 experiments: baseline, LoRA+ (ratio=16), dropout 0.05, dropout 0.1,
curriculum sampling, and all three combined. bf16 batch 16, 2000 steps,
seed 42. data_dir placeholder needs to be updated before running.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 13:15:06 +02:00