11 Commits

Author SHA1 Message Date
Ethanfel e5110b88e1 feat: auto input_sr — detect bandwidth and pick the best value
New "auto" option (now the default) on the Sampler's input_sr. detect_input_sr
finds the spectral cutoff cliff (steepest drop) and its dB confidence: effective
cutoff = that cliff if confident, else sr/2 — one rule that covers band-limited
(→ matched input_sr), full-band (→ 24000), and genuine low-rate files
(→ their rate). Rounds DOWN to the nearest supported Nyquist to avoid feeding
the model an empty band. Logs its decision. Falls back to 24000 when unsure.

Tests cover sharp 4/6/8/12 kHz cutoffs, full-band, genuine-8kHz, silence, stereo.
Verified end-to-end on the real model (8 kHz clip -> auto picks 16000).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 12:46:02 +02:00
Ethanfel 8d4cd71723 docs: design for auto input_sr (bandwidth auto-detect)
Cliff/edge spectral detector with a confidence score; effective cutoff =
cliff if confident else sr/2 (covers band-limited, full-band, and genuine
low-rate files in one rule). Round down to the nearest supported Nyquist.
Adds an "auto" dropdown option (default). Validated empirically before design.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 12:42:50 +02:00
Ethanfel 0e036b34d9 fix(ui): string-valued input_sr combo so the arrows cycle correctly
Integer-valued combos break ComfyUI's arrow navigation: indexOf(value) fails
on the int/serialized-value mismatch, returns -1, and the widget snaps to
index 0 (8000) every click. Use string options ("8000".."24000"); the code
already does int(input_sr) everywhere, so behavior is unchanged. Updated the
example workflows' widgets_values to strings to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 12:22:19 +02:00
Ethanfel 94178a4851 fix(perf): default TF32 off; off = true fp32 (matmul + cuDNN conv)
Reported as "darker", but a fixed-seed spectral A/B shows TF32 is tonally
neutral (centroid 564→565 Hz, HF>8k 0.00825→0.00833) — the perceived change
is the seed=0 random-noise confound, not TF32. Still, TF32 is only ~1.15x and
not bit-exact, so default it OFF for reference-fp32 output and let compile
(~2.1x, op fusion) be the headline speedup. apply_tf32 now also toggles
cuDNN conv-TF32 (PyTorch leaves it on by default), so off is genuinely fp32.
Docs updated with the seed-confound A/B guidance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 10:47:39 +02:00
Ethanfel 104cd4bf5f feat: equal-quality speed options (TF32 + torch.compile)
Add two opt-in inference speedups to the Model Loader, validated to leave the
output perceptually identical (deviation at the fp32 rounding floor):

- tf32 (default on): TF32 matmul on Ampere+ (~1.15x).
- compile (opt-in): torch.compile the UNet (~2.1x). Stacks with TF32 to
  ~2.5x (measured 4.3s -> 1.7s on a 12s clip).

torch.compile needs a static shape (the model's adaptive-avg-pool can't trace
dynamic shapes), so the sampler pads every chunk to chunk_seconds — clips of
any length reuse one compiled graph (no per-length recompiles; verified an 8s
clip after a 12s clip ran in 0.9s with no recompile).

Researched + profiled first: CFG-batching, channel/chunk batching, and
channels_last gave ~0 gain because the GPU is already compute-bound at batch 1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 17:16:21 +02:00
Ethanfel 9a901adcc5 fix(video): double preview on the upload loader
ComfyUI core (frontend 1.42.x) natively renders a node's `ui.gifs` output
as a media preview, so our own JS preview widget produced a second one.
Return the preview under a custom `universr_videos` key that core ignores;
our web extension is now the only thing that renders it — single preview.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:46:29 +02:00
Ethanfel fd5922b1cd feat(video): add path-string loader variant
UniverSR Load Video Audio (Path) mirrors FoleyTuneVideoLoader: takes an
absolute video_path (for files outside input/) and outputs the same
(UNIVERSR_VIDEO, AUDIO). Shared load body factored into _load_video_audio;
registered for the inline preview (post-run) in the web extension.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:29:16 +02:00
Ethanfel 8972fed805 refactor(video): single Foley-style upload loader with inline preview
Replace the path+dropdown loader (and its non-rendering ui.gifs) with one
node mirroring FoleyTuneVideoLoaderUpload: a `video` upload widget with
drag-drop and an inline video preview, shipped via web/js/UniverSRVideo.js
(adapted from FoleyTuneVideo.js) + WEB_DIRECTORY.

The loader now outputs (UNIVERSR_VIDEO, AUDIO) so you can super-resolve the
audio and remux it. Updated the example workflow output order and README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:25:06 +02:00
Ethanfel 5acaffab92 feat: video loader + combiner nodes
Adapted from HunyuanVideo-FoleyTune for the audio-SR workflow:

- UniverSR Load Video Audio: extract a video's audio track via ffmpeg
  (WAV pipe + soundfile, no torchcodec) and carry a UNIVERSR_VIDEO
  reference forward, with an inline video preview.
- UniverSR Video Combiner: mux the enhanced audio back onto the source
  video without re-encoding video (-c:v copy), trim-aware, with output
  auto-increment and preview.

Both registered alongside the SR nodes; ffmpeg + soundfile required only
for these. Adds README docs and an example video workflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:08:28 +02:00
Ethanfel 12cbc415cf docs: full node documentation in README
Comprehensive README: features, install, model auto-download, a
parameter reference for both nodes, an input_sr guide (SR vs BWE),
recommended settings, chunking, how-it-works, and troubleshooting.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:02:10 +02:00
Ethanfel 5f29b225b7 Initial release: ComfyUI-UniverSR
ComfyUI nodes for UniverSR (ICASSP 2026) — vocoder-free audio
super-resolution (8/12/16/24 kHz → 48 kHz) via flow matching.

- UniverSR Model Loader: presets auto-download to models/universr,
  plus local dir / raw .pth (from_local) loading, with caching.
- UniverSR Super-Resolution: chunked overlap-add for long audio,
  per-channel stereo, seed control with global-RNG isolation,
  wet/dry blend, and an optional before/after spectrogram.
- Vendors the universr inference package under vendor/ (prefers an
  installed copy); only extra dep beyond ComfyUI's stack is torchdiffeq.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 12:59:42 +02:00