ComfyUI-UniverSR

Ethanfel/ComfyUI-UniverSR

Fork 0

Commit Graph

Author	SHA1	Message	Date
Ethanfel	104cd4bf5f	feat: equal-quality speed options (TF32 + torch.compile) Add two opt-in inference speedups to the Model Loader, validated to leave the output perceptually identical (deviation at the fp32 rounding floor): - tf32 (default on): TF32 matmul on Ampere+ (~1.15x). - compile (opt-in): torch.compile the UNet (~2.1x). Stacks with TF32 to ~2.5x (measured 4.3s -> 1.7s on a 12s clip). torch.compile needs a static shape (the model's adaptive-avg-pool can't trace dynamic shapes), so the sampler pads every chunk to chunk_seconds — clips of any length reuse one compiled graph (no per-length recompiles; verified an 8s clip after a 12s clip ran in 0.9s with no recompile). Researched + profiled first: CFG-batching, channel/chunk batching, and channels_last gave ~0 gain because the GPU is already compute-bound at batch 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 17:16:21 +02:00
Ethanfel	5f29b225b7	Initial release: ComfyUI-UniverSR ComfyUI nodes for UniverSR (ICASSP 2026) — vocoder-free audio super-resolution (8/12/16/24 kHz → 48 kHz) via flow matching. - UniverSR Model Loader: presets auto-download to models/universr, plus local dir / raw .pth (from_local) loading, with caching. - UniverSR Super-Resolution: chunked overlap-add for long audio, per-channel stereo, seed control with global-RNG isolation, wet/dry blend, and an optional before/after spectrogram. - Vendors the universr inference package under vendor/ (prefers an installed copy); only extra dep beyond ComfyUI's stack is torchdiffeq. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 12:59:42 +02:00

Author

SHA1

Message

Date

Ethanfel

104cd4bf5f

feat: equal-quality speed options (TF32 + torch.compile)

Add two opt-in inference speedups to the Model Loader, validated to leave the
output perceptually identical (deviation at the fp32 rounding floor):

- tf32 (default on): TF32 matmul on Ampere+ (~1.15x).
- compile (opt-in): torch.compile the UNet (~2.1x). Stacks with TF32 to
  ~2.5x (measured 4.3s -> 1.7s on a 12s clip).

torch.compile needs a static shape (the model's adaptive-avg-pool can't trace
dynamic shapes), so the sampler pads every chunk to chunk_seconds — clips of
any length reuse one compiled graph (no per-length recompiles; verified an 8s
clip after a 12s clip ran in 0.9s with no recompile).

Researched + profiled first: CFG-batching, channel/chunk batching, and
channels_last gave ~0 gain because the GPU is already compute-bound at batch 1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-16 17:16:21 +02:00

Ethanfel

5f29b225b7

Initial release: ComfyUI-UniverSR

ComfyUI nodes for UniverSR (ICASSP 2026) — vocoder-free audio
super-resolution (8/12/16/24 kHz → 48 kHz) via flow matching.

- UniverSR Model Loader: presets auto-download to models/universr,
  plus local dir / raw .pth (from_local) loading, with caching.
- UniverSR Super-Resolution: chunked overlap-add for long audio,
  per-channel stereo, seed control with global-RNG isolation,
  wet/dry blend, and an optional before/after spectrogram.
- Vendors the universr inference package under vendor/ (prefers an
  installed copy); only extra dep beyond ComfyUI's stack is torchdiffeq.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-01 12:59:42 +02:00

2 Commits