feat: video loader + combiner nodes

Adapted from HunyuanVideo-FoleyTune for the audio-SR workflow: - UniverSR Load Video Audio: extract a video's audio track via ffmpeg (WAV pipe + soundfile, no torchcodec) and carry a UNIVERSR_VIDEO reference forward, with an inline video preview. - UniverSR Video Combiner: mux the enhanced audio back onto the source video without re-encoding video (-c:v copy), trim-aware, with output auto-increment and preview. Both registered alongside the SR nodes; ffmpeg + soundfile required only for these. Adds README docs and an example video workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:08:28 +02:00
parent 12cbc415cf
commit 5acaffab92
5 changed files with 446 additions and 5 deletions
@@ -27,6 +27,8 @@ muffled or band‑limited audio gets believable "air" and detail back.
 - [Nodes](#nodes)
  - [UniverSR Model Loader](#universr-model-loader)
  - [UniverSR Super-Resolution](#universr-super-resolution)
+  - [UniverSR Load Video Audio](#universr-load-video-audio)
+  - [UniverSR Video Combiner](#universr-video-combiner)
 - [Choosing `input_sr`](#choosing-input_sr-the-one-setting-that-matters-most)
 - [Recommended settings](#recommended-settings)
 - [Long audio & chunking](#long-audio--chunking)
@@ -47,6 +49,8 @@ muffled or band‑limited audio gets believable "air" and detail back.
 - 🎛️ **Wet/dry blend** — full SR, or dial it back to gently brighten already-48 kHz audio (BWE).
 - 🎲 **Seed control** with **global-RNG isolation** (won't perturb other nodes' randomness).
 - 📊 Optional **before/after spectrogram** image output.
+- 🎬 **Video in / out** — extract a video's audio, super-resolve it, and remux it back onto the
+  original video (no video re-encode), all with `ffmpeg`.
 - 📦 **Self-contained** — the UniverSR inference code is vendored; the only extra dependency beyond
  ComfyUI's stack is `torchdiffeq`.

@@ -74,6 +78,10 @@ that typically needs installing.) The `universr` package itself is **vendored**
 `pip`-installed copy is found it is preferred, otherwise the bundled one is used, so no `git+` install
 is required.

+The **video** nodes additionally need **`ffmpeg`** on your `PATH` (`apt install ffmpeg` /
+`brew install ffmpeg` / `conda install -c conda-forge ffmpeg`) and `soundfile` (in `requirements.txt`).
+The audio SR nodes work without either.
+
 > **GPU recommended.** Inference runs on CUDA if available and falls back to CPU (much slower).

 ---
@@ -142,6 +150,46 @@ Runs the super-resolution. Outputs: **`AUDIO`** (48 kHz) and **`IMAGE`** (spectr
 | `unload_model` | bool | `false` | — | Free the model from VRAM after this run. |
 | `show_spectrogram` | bool | `true` | — | Also output a before/after spectrogram comparison image. |

+### UniverSR Load Video Audio
+
+Extracts a video's audio track (native rate/channels, via `ffmpeg`) and keeps a reference to the
+source video for remuxing. Outputs **`AUDIO`** and **`UNIVERSR_VIDEO`**, and previews the video inline.
+
+| Input | Type | Default | Description |
+|---|---|---|---|
+| `video_path` | string | `""` | Absolute path to a video. Takes priority over `video`. |
+| `video` *(opt.)* | choice | — | Pick a file from ComfyUI's `input/` folder (used when `video_path` is empty). |
+| `start_time` *(opt.)* | float | `0.0` | Trim start, seconds. |
+| `duration` *(opt.)* | float | `0.0` | Trim length, seconds (`0` = to end). |
+
+### UniverSR Video Combiner
+
+Muxes an `AUDIO` track onto the source video **without re-encoding the video** (`-c:v copy`) and saves
+the result. If the loader trimmed the clip, the same trim is applied to the video so A/V stay aligned.
+
+| Input | Type | Default | Description |
+|---|---|---|---|
+| `video` | UNIVERSR_VIDEO | — | From **UniverSR Load Video Audio**. |
+| `audio` | AUDIO | — | The enhanced 48 kHz audio. |
+| `filename_prefix` | string | `UniverSR` | Output name prefix (auto-incremented). |
+| `audio_codec` *(opt.)* | choice | `aac` | `aac` / `flac` / `pcm_s16le` / `libopus` / `libmp3lame`. |
+| `save_output` *(opt.)* | bool | `true` | Save to `output/` (else `temp/`). |
+
+Output: `output_path` (string) and an inline video preview.
+
+#### Video workflow
+
+```
+UniverSR Load Video Audio ──┬─ audio ─► UniverSR Super-Resolution ─ audio ─┐
+                            │                                              ▼
+                            └────────────── video ──────────────► UniverSR Video Combiner ─► .mp4
+                                              UniverSR Model Loader ─► (Super-Resolution)
+```
+
+Load the video → super-resolve its audio (set `input_sr` to the content bandwidth) → feed the enhanced
+audio **and** the `video` reference into the combiner. Ready-made graph:
+[`example_workflows/universr_video.json`](example_workflows/universr_video.json).
+
 ---

 ## Choosing `input_sr` (the one setting that matters most)