ComfyUI-Tween/README.md

# ComfyUI BIM-VFI + EMA-VFI + SGM-VFI

ComfyUI custom nodes for video frame interpolation using [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) (CVPR 2025), [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) (CVPR 2023), and [SGM-VFI](https://github.com/MCG-NJU/SGM-VFI) (CVPR 2024). Designed for long videos with thousands of frames — processes them without running out of VRAM.

## Which model should I use?

| | BIM-VFI | EMA-VFI | SGM-VFI |
|---|---------|---------|---------|
| **Best for** | General-purpose, non-uniform motion | Fast inference, light VRAM | Large motion, occlusion-heavy scenes |
| **Quality** | Highest overall | Good | Best on large motion |
| **Speed** | Moderate | Fastest | Slowest |
| **VRAM** | ~2 GB/pair | ~1.5 GB/pair | ~3 GB/pair |
| **Params** | ~17M | ~14–65M | ~15M + GMFlow |
| **Arbitrary timestep** | Yes | Yes (with `_t` checkpoint) | No (fixed 0.5) |
| **Paper** | CVPR 2025 | CVPR 2023 | CVPR 2024 |
| **License** | Research only | Apache 2.0 | Apache 2.0 |

**TL;DR:** Start with **BIM-VFI** for best quality. Use **EMA-VFI** if you need speed or lower VRAM. Use **SGM-VFI** if your video has large camera motion or fast-moving objects that the others struggle with.

## Nodes

### BIM-VFI

#### Load BIM-VFI Model

Loads the BiM-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/bim-vfi/`.

| Input | Description |
|-------|-------------|
| **model_path** | Checkpoint file from `models/bim-vfi/` |
| **auto_pyr_level** | Auto-select pyramid level by resolution (&lt;540p=3, 540p=5, 1080p=6, 4K=7) |
| **pyr_level** | Manual pyramid level (3-7), only used when auto is off |

#### BIM-VFI Interpolate

Interpolates frames from an image batch.

| Input | Description |
|-------|-------------|
| **images** | Input image batch |
| **model** | Model from the loader node |
| **multiplier** | 2x, 4x, or 8x frame rate (recursive 2x passes) |
| **batch_size** | Frame pairs processed simultaneously (higher = faster, more VRAM) |
| **chunk_size** | Process in segments of N input frames (0 = disabled). Bounds VRAM for very long videos. Result is identical to processing all at once |
| **keep_device** | Keep model on GPU between pairs (faster, ~200MB constant VRAM) |
| **all_on_gpu** | Keep all intermediate frames on GPU (fast, needs large VRAM) |
| **clear_cache_after_n_frames** | Clear CUDA cache every N pairs to prevent VRAM buildup |

#### BIM-VFI Segment Interpolate

Same as Interpolate but processes a single segment of the input. Chain multiple instances with Save nodes between them to bound peak RAM. The model pass-through output forces sequential execution.

### Tween Concat Videos

Concatenates segment video files into a single video using ffmpeg. Connect from any Segment Interpolate's model output to ensure it runs after all segments are saved. Works with all three models.

### EMA-VFI

#### Load EMA-VFI Model

Loads an EMA-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/ema-vfi/`. Variant (large/small) and timestep support are auto-detected from the filename.

| Input | Description |
|-------|-------------|
| **model_path** | Checkpoint file from `models/ema-vfi/` |
| **tta** | Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality) |

Available checkpoints:
| Checkpoint | Variant | Params | Arbitrary timestep |
|-----------|---------|--------|-------------------|
| `ours_t.pkl` | Large | ~65M | Yes |
| `ours.pkl` | Large | ~65M | No (fixed 0.5) |
| `ours_small_t.pkl` | Small | ~14M | Yes |
| `ours_small.pkl` | Small | ~14M | No (fixed 0.5) |

#### EMA-VFI Interpolate

Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.

#### EMA-VFI Segment Interpolate

Same as EMA-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.

### SGM-VFI

#### Load SGM-VFI Model

Loads an SGM-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/sgm-vfi/`. Variant (base/small) is auto-detected from the filename (default is small).

| Input | Description |
|-------|-------------|
| **model_path** | Checkpoint file from `models/sgm-vfi/` |
| **tta** | Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality) |
| **num_key_points** | Sparsity of global matching (0.0 = global everywhere, 0.5 = default balance, higher = faster) |

Available checkpoints:
| Checkpoint | Variant | Params |
|-----------|---------|--------|
| `ours-1-2-points.pth` | Small | ~15M + GMFlow |

#### SGM-VFI Interpolate

Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.

#### SGM-VFI Segment Interpolate

Same as SGM-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.

**Output frame count (all models):** 2x = 2N-1, 4x = 4N-3, 8x = 8N-7

## Installation

Clone into your ComfyUI `custom_nodes/` directory:

```bash
cd ComfyUI/custom_nodes
git clone https://github.com/your-user/ComfyUI-Tween.git
```

Dependencies (`gdown`, `cupy`, `timm`) are auto-installed on first load. The correct `cupy` variant is detected from your PyTorch CUDA version.

> **Warning:** `cupy` is a large package (~800MB) and compilation/installation can take several minutes. The first ComfyUI startup after installing this node may appear to hang while `cupy` installs in the background. Check the console log for progress. If auto-install fails (e.g. missing build tools in Docker), install manually with:
> ```bash
> pip install cupy-cuda12x  # replace 12 with your CUDA major version
> ```

To install manually:

```bash
cd ComfyUI-Tween
python install.py
```

### Requirements

- PyTorch with CUDA
- `cupy` (matching your CUDA version, for BIM-VFI and SGM-VFI)
- `timm` (for EMA-VFI and SGM-VFI)
- `gdown` (for model auto-download)

## VRAM Guide

| VRAM | Recommended settings |
|------|---------------------|
| 8 GB | batch_size=1, chunk_size=500 |
| 24 GB | batch_size=2-4, chunk_size=1000 |
| 48 GB+ | batch_size=4-16, all_on_gpu=true |
| 96 GB+ | batch_size=8-16, all_on_gpu=true, chunk_size=0 |

## Acknowledgments

This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) implementation by the [KAIST VIC Lab](https://github.com/KAIST-VICLab), the official [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) implementation by MCG-NJU, and the official [SGM-VFI](https://github.com/MCG-NJU/SGM-VFI) implementation by MCG-NJU. Architecture files in `bim_vfi_arch/`, `ema_vfi_arch/`, and `sgm_vfi_arch/` are vendored from their respective repositories with minimal modifications (relative imports, device-awareness fixes, inference-only paths).

**BiM-VFI:**
> Wonyong Seo, Jihyong Oh, and Munchurl Kim.
> "BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions."
> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2025.
> [[arXiv]](https://arxiv.org/abs/2412.11365) [[Project Page]](https://kaist-viclab.github.io/BiM-VFI_site/) [[GitHub]](https://github.com/KAIST-VICLab/BiM-VFI)

```bibtex
@inproceedings{seo2025bimvfi,
  title={BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions},
  author={Seo, Wonyong and Oh, Jihyong and Kim, Munchurl},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}
```

**EMA-VFI:**
> Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, and Limin Wang.
> "Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation."
> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2023.
> [[arXiv]](https://arxiv.org/abs/2303.00440) [[GitHub]](https://github.com/MCG-NJU/EMA-VFI)

```bibtex
@inproceedings{zhang2023emavfi,
  title={Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation},
  author={Zhang, Guozhen and Zhu, Yuhan and Wang, Haonan and Chen, Youxin and Wu, Gangshan and Wang, Limin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}
```

**SGM-VFI:**
> Guozhen Zhang, Yuhan Zhu, Evan Zheran Liu, Haonan Wang, Mingzhen Sun, Gangshan Wu, and Limin Wang.
> "Sparse Global Matching for Video Frame Interpolation with Large Motion."
> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2024.
> [[arXiv]](https://arxiv.org/abs/2404.06913) [[GitHub]](https://github.com/MCG-NJU/SGM-VFI)

```bibtex
@inproceedings{zhang2024sgmvfi,
  title={Sparse Global Matching for Video Frame Interpolation with Large Motion},
  author={Zhang, Guozhen and Zhu, Yuhan and Liu, Evan Zheran and Wang, Haonan and Sun, Mingzhen and Wu, Gangshan and Wang, Limin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}
```

## License

The BiM-VFI model weights and architecture code are provided by KAIST VIC Lab for **research and education purposes only**. Commercial use requires permission from the principal investigator (Prof. Munchurl Kim, mkimee@kaist.ac.kr). See the [original repository](https://github.com/KAIST-VICLab/BiM-VFI) for details.

The EMA-VFI model weights and architecture code are released under the [Apache 2.0 License](https://github.com/MCG-NJU/EMA-VFI/blob/main/LICENSE). See the [original repository](https://github.com/MCG-NJU/EMA-VFI) for details.

The SGM-VFI model weights and architecture code are released under the [Apache 2.0 License](https://github.com/MCG-NJU/SGM-VFI/blob/main/LICENSE). See the [original repository](https://github.com/MCG-NJU/SGM-VFI) for details.