Add EMA-VFI (CVPR 2023) frame interpolation support

Integrate EMA-VFI alongside existing BIM-VFI with three new ComfyUI nodes: Load EMA-VFI Model, EMA-VFI Interpolate, and EMA-VFI Segment Interpolate. Architecture files vendored from MCG-NJU/EMA-VFI with device-awareness fixes (removed hardcoded .cuda() calls), warp cache management, and relative imports. InputPadder extended to support EMA-VFI's replicate center-symmetric padding. Auto-installs timm dependency on first load. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 22:30:06 +01:00
parent 0133f61d47
commit 1de086569c
11 changed files with 1334 additions and 18 deletions
@@ -1,10 +1,12 @@
-# ComfyUI BIM-VFI
+# ComfyUI BIM-VFI + EMA-VFI

-ComfyUI custom nodes for video frame interpolation using [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) (CVPR 2025). Designed for long videos with thousands of frames — processes them without running out of VRAM.
+ComfyUI custom nodes for video frame interpolation using [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) (CVPR 2025) and [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) (CVPR 2023). Designed for long videos with thousands of frames — processes them without running out of VRAM.

 ## Nodes

-### Load BIM-VFI Model
+### BIM-VFI
+
+#### Load BIM-VFI Model

 Loads the BiM-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/bim-vfi/`.

@@ -14,7 +16,7 @@ Loads the BiM-VFI checkpoint. Auto-downloads from Google Drive on first use to `
 | **auto_pyr_level** | Auto-select pyramid level by resolution (&lt;540p=3, 540p=5, 1080p=6, 4K=7) |
 | **pyr_level** | Manual pyramid level (3-7), only used when auto is off |

-### BIM-VFI Interpolate
+#### BIM-VFI Interpolate

 Interpolates frames from an image batch.

@@ -24,12 +26,47 @@ Interpolates frames from an image batch.
 | **model** | Model from the loader node |
 | **multiplier** | 2x, 4x, or 8x frame rate (recursive 2x passes) |
 | **batch_size** | Frame pairs processed simultaneously (higher = faster, more VRAM) |
-| **chunk_size** | Process in segments of N input frames (0 = disabled). Bounds memory for very long videos. Result is identical to processing all at once |
+| **chunk_size** | Process in segments of N input frames (0 = disabled). Bounds VRAM for very long videos. Result is identical to processing all at once |
 | **keep_device** | Keep model on GPU between pairs (faster, ~200MB constant VRAM) |
 | **all_on_gpu** | Keep all intermediate frames on GPU (fast, needs large VRAM) |
 | **clear_cache_after_n_frames** | Clear CUDA cache every N pairs to prevent VRAM buildup |

-**Output frame count:** 2x = 2N-1, 4x = 4N-3, 8x = 8N-7
+#### BIM-VFI Segment Interpolate
+
+Same as Interpolate but processes a single segment of the input. Chain multiple instances with Save nodes between them to bound peak RAM. The model pass-through output forces sequential execution.
+
+#### BIM-VFI Concat Videos
+
+Concatenates segment video files into a single video using ffmpeg. Connect from the last Segment Interpolate's model output to ensure it runs after all segments are saved.
+
+### EMA-VFI
+
+#### Load EMA-VFI Model
+
+Loads an EMA-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/ema-vfi/`. Variant (large/small) and timestep support are auto-detected from the filename.
+
+| Input | Description |
+|-------|-------------|
+| **model_path** | Checkpoint file from `models/ema-vfi/` |
+| **tta** | Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality) |
+
+Available checkpoints:
+| Checkpoint | Variant | Params | Arbitrary timestep |
+|-----------|---------|--------|-------------------|
+| `ours_t.pkl` | Large | ~65M | Yes |
+| `ours.pkl` | Large | ~65M | No (fixed 0.5) |
+| `ours_small_t.pkl` | Small | ~14M | Yes |
+| `ours_small.pkl` | Small | ~14M | No (fixed 0.5) |
+
+#### EMA-VFI Interpolate
+
+Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.
+
+#### EMA-VFI Segment Interpolate
+
+Same as EMA-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.
+
+**Output frame count (both models):** 2x = 2N-1, 4x = 4N-3, 8x = 8N-7

 ## Installation

@@ -40,7 +77,7 @@ cd ComfyUI/custom_nodes
 git clone https://github.com/your-user/Comfyui-BIM-VFI.git
 ```

-Dependencies (`gdown`, `cupy`) are auto-installed on first load. The correct `cupy` variant is detected from your PyTorch CUDA version.
+Dependencies (`gdown`, `cupy`, `timm`) are auto-installed on first load. The correct `cupy` variant is detected from your PyTorch CUDA version.

 > **Warning:** `cupy` is a large package (~800MB) and compilation/installation can take several minutes. The first ComfyUI startup after installing this node may appear to hang while `cupy` installs in the background. Check the console log for progress. If auto-install fails (e.g. missing build tools in Docker), install manually with:
 > ```bash
@@ -57,7 +94,8 @@ python install.py
 ### Requirements

 - PyTorch with CUDA
- `cupy` (matching your CUDA version)
+- `cupy` (matching your CUDA version, for BIM-VFI)
+- `timm` (for EMA-VFI)
 - `gdown` (for model auto-download)

 ## VRAM Guide
@@ -71,9 +109,9 @@ python install.py

 ## Acknowledgments

-This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) implementation by the [KAIST VIC Lab](https://github.com/KAIST-VICLab). The model architecture files in `bim_vfi_arch/` are vendored from their repository with minimal modifications (relative imports, inference-only paths).
+This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) implementation by the [KAIST VIC Lab](https://github.com/KAIST-VICLab) and the official [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) implementation by MCG-NJU. Architecture files in `bim_vfi_arch/` and `ema_vfi_arch/` are vendored from their respective repositories with minimal modifications (relative imports, device-awareness fixes, inference-only paths).

-**Paper:**
+**BiM-VFI:**
 > Wonyong Seo, Jihyong Oh, and Munchurl Kim.
 > "BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions."
 > *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2025.
@@ -88,6 +126,23 @@ This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VF
 }
 ```

+**EMA-VFI:**
+> Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, and Limin Wang.
+> "Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation."
+> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2023.
+> [[arXiv]](https://arxiv.org/abs/2303.00440) [[GitHub]](https://github.com/MCG-NJU/EMA-VFI)
+
+```bibtex
+@inproceedings{zhang2023emavfi,
+  title={Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation},
+  author={Zhang, Guozhen and Zhu, Yuhan and Wang, Haonan and Chen, Youxin and Wu, Gangshan and Wang, Limin},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2023}
+}
+```
+
 ## License

 The BiM-VFI model weights and architecture code are provided by KAIST VIC Lab for **research and education purposes only**. Commercial use requires permission from the principal investigator (Prof. Munchurl Kim, mkimee@kaist.ac.kr). See the [original repository](https://github.com/KAIST-VICLab/BiM-VFI) for details.
+
+The EMA-VFI model weights and architecture code are released under the [Apache 2.0 License](https://github.com/MCG-NJU/EMA-VFI/blob/main/LICENSE). See the [original repository](https://github.com/MCG-NJU/EMA-VFI) for details.