fix: catch all exceptions when importing cupy, not just ImportError

An installed-but-broken cupy (e.g. incompatible with NumPy 2.5, which removed the 'bool8' alias) raises a TypeError during its own import, not an ImportError. The narrow `except ImportError` guard let that propagate and crashed the entire node import chain. Broaden the guard to `except Exception` in all three CUDA-kernel modules so any import-time failure disables cupy and falls back to the pure-PyTorch implementations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs: add cupy-fallback implementation plan
2026-06-27 19:51:28 +02:00 · 2026-04-11 10:27:52 +02:00 · 2026-04-11 10:13:30 +02:00 · 2026-04-11 02:20:20 +02:00 · 2026-04-11 02:12:03 +02:00 · 2026-04-11 02:11:30 +02:00
14 changed files with 1904 additions and 679 deletions
@@ -0,0 +1,20 @@
 name: Publish to Comfy registry
 on:
  workflow_dispatch:
  push:
    branches:
      - master
    paths:
      - "pyproject.toml"
 jobs:
  publish-node:
    name: Publish Custom Node to registry
    runs-on: ubuntu-latest
    steps:
      - name: Check out code
        uses: actions/checkout@v4
      - name: Publish Custom Node
        uses: Comfy-Org/publish-node-action@main
        with:
          personal_access_token: ${{ secrets.REGISTRY_ACCESS_TOKEN }}
@@ -1,40 +1,86 @@
-# ComfyUI BIM-VFI + EMA-VFI + SGM-VFI + GIMM-VFI
+# Tween — Video Frame Interpolation for ComfyUI
-ComfyUI custom nodes for video frame interpolation using [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) (CVPR 2025), [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) (CVPR 2023), [SGM-VFI](https://github.com/MCG-NJU/SGM-VFI) (CVPR 2024), and [GIMM-VFI](https://github.com/GSeanCDAT/GIMM-VFI) (NeurIPS 2024). Designed for long videos with thousands of frames — processes them without running out of VRAM.
+[![ComfyUI](https://img.shields.io/badge/ComfyUI-Custom_Node-0a7ef0)](https://registry.comfy.org/)
 [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)](https://www.python.org/)
 [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://www.apache.org/licenses/LICENSE-2.0)
 [![Models](https://img.shields.io/badge/VFI_Models-4-8B5CF6)](#which-model-should-i-use)
 Four video frame interpolation models in one package — **BIM-VFI**, **EMA-VFI**, **SGM-VFI**, and **GIMM-VFI**. Designed for long videos with thousands of frames without running out of VRAM.
 <p align="center">
  <img src="assets/model-comparison.svg" alt="Model Comparison" width="720"/>
 </p>
 ## Installation
 Install from the [ComfyUI Registry](https://registry.comfy.org/) (recommended) or clone manually:
 ```bash
 cd ComfyUI/custom_nodes
 git clone https://github.com/Ethanfel/ComfyUI-Tween.git
 pip install -r requirements.txt
 ```
 All dependencies (`gdown`, `timm`, `omegaconf`, `easydict`, `yacs`, `einops`, `huggingface_hub`) are declared in `pyproject.toml` and `requirements.txt`, installed automatically by ComfyUI Manager or pip.
 ### cupy (required for BIM-VFI, SGM-VFI, GIMM-VFI)
 [cupy](https://cupy.dev/) provides GPU-accelerated optical flow warping. **EMA-VFI works without it.**
 1. Find your CUDA version:
   ```bash
   python -c "import torch; print(torch.version.cuda)"
   ```
 2. Install the matching package:
   | CUDA | Command |
   |------|---------|
   | 12.x | `pip install cupy-cuda12x` |
   | 11.x | `pip install cupy-cuda11x` |
 > Make sure to run pip in the same Python environment as ComfyUI. If cupy is missing, the Load node shows an error with your CUDA version and the exact install command.
 <details>
 <summary>cupy troubleshooting</summary>
 | Problem | Solution |
 |---------|----------|
 | `ModuleNotFoundError: No module named 'cupy'` | Install cupy using the steps above |
 | `cupy` installed but `ImportError` at runtime | CUDA version mismatch — uninstall and reinstall the correct version |
 | Install hangs or takes very long | cupy wheels are ~800 MB, be patient |
 | Docker / no build tools | Use the prebuilt wheel: `pip install cupy-cuda12x` (not bare `cupy` which compiles from source) |
 </details>
 ## Which model should I use?
 | | BIM-VFI | EMA-VFI | SGM-VFI | GIMM-VFI |
 |---|---------|---------|---------|----------|
-| **Best for** | General-purpose, non-uniform motion | Fast inference, light VRAM | Large motion, occlusion-heavy scenes | High multipliers (4x/8x) in a single pass |
+| **Best for** | General-purpose | Fast, low VRAM | Large motion | High multipliers (4x/8x) |
-| **Quality** | Highest overall | Good | Best on large motion | Good |
+| **Quality** | Highest | Good | Best on large motion | Good |
-| **Speed** | Moderate | Fastest | Slowest | Fast for 4x/8x (single pass) |
+| **Speed** | Moderate | Fastest | Slowest | Fast for 4x/8x |
 | **VRAM** | ~2 GB/pair | ~1.5 GB/pair | ~3 GB/pair | ~2.5 GB/pair |
-| **Params** | ~17M | ~14–65M | ~15M + GMFlow | ~80M (RAFT) / ~123M (FlowFormer) |
+| **Params** | ~17 M | ~14–65 M | ~15 M + GMFlow | ~80 M (RAFT) / ~123 M (FlowFormer) |
-| **Arbitrary timestep** | Yes | Yes (with `_t` checkpoint) | No (fixed 0.5) | Yes (native single-pass) |
+| **Arbitrary timestep** | Yes | Yes (`_t` checkpoint) | No (fixed 0.5) | Yes (native) |
-| **4x/8x mode** | Recursive 2x passes | Recursive 2x passes | Recursive 2x passes | Single forward pass (or recursive) |
+| **4x/8x** | Recursive passes | Recursive passes | Recursive passes | Single forward pass |
 | **Requires cupy** | Yes | No | Yes | Yes |
 | **Paper** | CVPR 2025 | CVPR 2023 | CVPR 2024 | NeurIPS 2024 |
 | **License** | Research only | Apache 2.0 | Apache 2.0 | Apache 2.0 |
-**TL;DR:** Start with **BIM-VFI** for best quality. Use **EMA-VFI** if you need speed or lower VRAM. Use **SGM-VFI** if your video has large camera motion or fast-moving objects that the others struggle with. Use **GIMM-VFI** when you want 4x or 8x interpolation without recursive passes — it generates all intermediate frames in a single forward pass per pair.
+**TL;DR:** Start with **BIM-VFI** for best quality. Use **EMA-VFI** for speed or if you can't install cupy. Use **SGM-VFI** for large camera motion. Use **GIMM-VFI** for 4x/8x without recursive passes.
 ## VRAM Guide
 | VRAM | Recommended settings |
 |------|----------------------|
 | 8 GB | `batch_size=1, chunk_size=500` |
 | 24 GB | `batch_size=2–4, chunk_size=1000` |
 | 48 GB+ | `batch_size=4–16, all_on_gpu=true` |
 | 96 GB+ | `batch_size=8–16, all_on_gpu=true, chunk_size=0` |
 ## Nodes
-### BIM-VFI
+All Interpolate nodes share a common set of controls:
 #### Load BIM-VFI Model
 Loads the BiM-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/bim-vfi/`.
 | Input | Description |
 |-------|-------------|
 | **model_path** | Checkpoint file from `models/bim-vfi/` |
 | **auto_pyr_level** | Auto-select pyramid level by resolution (&lt;540p=3, 540p=5, 1080p=6, 4K=7) |
 | **pyr_level** | Manual pyramid level (3-7), only used when auto is off |
 #### BIM-VFI Interpolate
 Interpolates frames from an image batch.
 | Input | Description |
 |-------|-------------|
@@ -42,152 +88,142 @@ Interpolates frames from an image batch.
 | **model** | Model from the loader node |
 | **multiplier** | 2x, 4x, or 8x frame rate (recursive 2x passes) |
 | **batch_size** | Frame pairs processed simultaneously (higher = faster, more VRAM) |
-| **chunk_size** | Process in segments of N input frames (0 = disabled). Bounds VRAM for very long videos. Result is identical to processing all at once |
+| **chunk_size** | Process in segments of N input frames (0 = disabled). Bounds VRAM for very long videos |
-| **keep_device** | Keep model on GPU between pairs (faster, ~200MB constant VRAM) |
+| **keep_device** | Keep model on GPU between pairs (faster, ~200 MB constant VRAM) |
 | **all_on_gpu** | Keep all intermediate frames on GPU (fast, needs large VRAM) |
 | **clear_cache_after_n_frames** | Clear CUDA cache every N pairs to prevent VRAM buildup |
 | **source_fps** | Input frame rate. Required when target_fps > 0 |
 | **target_fps** | Target output FPS. When > 0, overrides multiplier — auto-computes the optimal power-of-2 oversample then selects frames at exact target timestamps. 0 = use multiplier |
 | Output | Description |
 |--------|-------------|
 | **images** | Interpolated frames at the target FPS (or at the multiplied rate when target_fps = 0) |
 | **oversampled** | Full power-of-2 oversampled frames before target FPS selection. Same as `images` when target_fps = 0 |
 <details>
 <summary><strong>BIM-VFI</strong></summary>
 #### Load BIM-VFI Model
 Loads the BiM-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/bim-vfi/`.
 | Input | Description |
 |-------|-------------|
 | **model_path** | Checkpoint from `models/bim-vfi/` |
 | **auto_pyr_level** | Auto pyramid level by resolution (&lt;540p=3, 540p=5, 1080p=6, 4K=7) |
 | **pyr_level** | Manual pyramid level (3–7), used when auto is off |
 #### BIM-VFI Interpolate
 Common controls listed above.
 #### BIM-VFI Segment Interpolate
-Same as Interpolate but processes a single segment of the input. Chain multiple instances with Save nodes between them to bound peak RAM. The model pass-through output forces sequential execution.
+Processes a single segment of the input. Chain multiple instances with Save nodes between them to bound peak RAM. The model pass-through output forces sequential execution.
-### Tween Concat Videos
+</details>
-Concatenates segment video files into a single video using ffmpeg. Connect from any Segment Interpolate's model output to ensure it runs after all segments are saved. Works with all three models.
+<details>
-
+<summary><strong>EMA-VFI</strong></summary>
 ### EMA-VFI
 #### Load EMA-VFI Model
-Loads an EMA-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/ema-vfi/`. Variant (large/small) and timestep support are auto-detected from the filename.
+Auto-downloads from Google Drive to `ComfyUI/models/ema-vfi/`. Variant and timestep support are auto-detected from the filename.
 | Input | Description |
 |-------|-------------|
-| **model_path** | Checkpoint file from `models/ema-vfi/` |
+| **model_path** | Checkpoint from `models/ema-vfi/` |
-| **tta** | Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality) |
+| **tta** | Test-time augmentation (~2x slower, slightly better quality) |
 Available checkpoints:
 | Checkpoint | Variant | Params | Arbitrary timestep |
 |-----------|---------|--------|-------------------|
-| `ours_t.pkl` | Large | ~65M | Yes |
+| `ours_t.pkl` | Large | ~65 M | Yes |
-| `ours.pkl` | Large | ~65M | No (fixed 0.5) |
+| `ours.pkl` | Large | ~65 M | No (fixed 0.5) |
-| `ours_small_t.pkl` | Small | ~14M | Yes |
+| `ours_small_t.pkl` | Small | ~14 M | Yes |
-| `ours_small.pkl` | Small | ~14M | No (fixed 0.5) |
+| `ours_small.pkl` | Small | ~14 M | No (fixed 0.5) |
-#### EMA-VFI Interpolate
+#### EMA-VFI Interpolate / Segment Interpolate
-Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.
+Same controls as above.
-#### EMA-VFI Segment Interpolate
+</details>
-Same as EMA-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.
+<details>
-
+<summary><strong>SGM-VFI</strong></summary>
 ### SGM-VFI
 #### Load SGM-VFI Model
-Loads an SGM-VFI checkpoint. Auto-downloads from Google Drive on first use to `ComfyUI/models/sgm-vfi/`. Variant (base/small) is auto-detected from the filename (default is small).
+Auto-downloads from Google Drive to `ComfyUI/models/sgm-vfi/`. Requires cupy.
 | Input | Description |
 |-------|-------------|
-| **model_path** | Checkpoint file from `models/sgm-vfi/` |
+| **model_path** | Checkpoint from `models/sgm-vfi/` |
-| **tta** | Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality) |
+| **tta** | Test-time augmentation (~2x slower, slightly better quality) |
-| **num_key_points** | Sparsity of global matching (0.0 = global everywhere, 0.5 = default balance, higher = faster) |
+| **num_key_points** | Global matching sparsity (0.0 = global everywhere, 0.5 = default, higher = faster) |
 Available checkpoints:
 | Checkpoint | Variant | Params |
 |-----------|---------|--------|
-| `ours-1-2-points.pkl` | Small | ~15M + GMFlow |
+| `ours-1-2-points.pkl` | Small | ~15 M + GMFlow |
-#### SGM-VFI Interpolate
+#### SGM-VFI Interpolate / Segment Interpolate
-Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.
+Same controls as above.
-#### SGM-VFI Segment Interpolate
+</details>
-Same as SGM-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.
+<details>
-
+<summary><strong>GIMM-VFI</strong></summary>
 ### GIMM-VFI
 #### Load GIMM-VFI Model
-Loads a GIMM-VFI checkpoint. Auto-downloads from [HuggingFace](https://huggingface.co/Kijai/GIMM-VFI_safetensors) on first use to `ComfyUI/models/gimm-vfi/`. The matching flow estimator (RAFT or FlowFormer) is auto-detected and downloaded alongside the main model.
+Auto-downloads from [HuggingFace](https://huggingface.co/Kijai/GIMM-VFI_safetensors) to `ComfyUI/models/gimm-vfi/`. The matching flow estimator (RAFT or FlowFormer) is auto-detected and downloaded alongside.
 | Input | Description |
 |-------|-------------|
-| **model_path** | Checkpoint file from `models/gimm-vfi/` |
+| **model_path** | Checkpoint from `models/gimm-vfi/` |
-| **ds_factor** | Downscale factor for internal processing (1.0 = full res, 0.5 = half). Lower = less VRAM, faster, less quality. Try 0.5 for 4K inputs |
+| **ds_factor** | Downscale factor for internal processing (1.0 = full, 0.5 = half). Try 0.5 for 4K inputs |
 Available checkpoints:
 | Checkpoint | Variant | Params | Flow estimator (auto-downloaded) |
 |-----------|---------|--------|----------------------------------|
-| `gimmvfi_r_arb_lpips_fp32.safetensors` | RAFT | ~80M | `raft-things_fp32.safetensors` |
+| `gimmvfi_r_arb_lpips_fp32.safetensors` | RAFT | ~80 M | `raft-things_fp32.safetensors` |
-| `gimmvfi_f_arb_lpips_fp32.safetensors` | FlowFormer | ~123M | `flowformer_sintel_fp32.safetensors` |
+| `gimmvfi_f_arb_lpips_fp32.safetensors` | FlowFormer | ~123 M | `flowformer_sintel_fp32.safetensors` |
 #### GIMM-VFI Interpolate
-Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate, plus:
+Common controls plus:
 | Input | Description |
 |-------|-------------|
-| **single_pass** | When enabled (default), generates all intermediate frames per pair in one forward pass using GIMM-VFI's arbitrary-timestep capability. No recursive 2x passes needed for 4x or 8x. Disable to use the standard recursive approach (same as BIM/EMA/SGM) |
+| **single_pass** | Generate all intermediate frames per pair in one forward pass (default on). No recursive 2x passes needed for 4x/8x. Disable to use the standard recursive approach |
 #### GIMM-VFI Segment Interpolate
-Same as GIMM-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.
+Same pattern as other Segment nodes.
-**Output frame count (all models):** 2x = 2N-1, 4x = 4N-3, 8x = 8N-7
+</details>
-## Installation
+### Tween Concat Videos
-Clone into your ComfyUI `custom_nodes/` directory:
+Concatenates segment video files into a single video using ffmpeg. Connect from any Segment Interpolate's model output to ensure it runs after all segments are saved. Works with all four models.
-```bash
+### Output frame count
 cd ComfyUI/custom_nodes
 git clone https://github.com/your-user/ComfyUI-Tween.git
 ```
-Dependencies (`gdown`, `cupy`, `timm`, `omegaconf`, `easydict`, `yacs`, `einops`, `huggingface_hub`) are auto-installed on first load. The correct `cupy` variant is detected from your PyTorch CUDA version.
+- **Multiplier mode:** 2x = 2N-1, 4x = 4N-3, 8x = 8N-7
-
+- **Target FPS mode:** `floor((N-1) / source_fps * target_fps) + 1` frames. Automatically oversamples to the nearest power-of-2 above the ratio, then selects frames at exact target timestamps. Downsampling (target < source) also works — frames are selected from the input with no model calls.
 > **Warning:** `cupy` is a large package (~800MB) and compilation/installation can take several minutes. The first ComfyUI startup after installing this node may appear to hang while `cupy` installs in the background. Check the console log for progress. If auto-install fails (e.g. missing build tools in Docker), install manually with:
 > ```bash
 > pip install cupy-cuda12x  # replace 12 with your CUDA major version
 > ```
 To install manually:
 ```bash
 cd ComfyUI-Tween
 python install.py
 ```
 ### Requirements
 - PyTorch with CUDA
 - `cupy` (matching your CUDA version, for BIM-VFI, SGM-VFI, and GIMM-VFI)
 - `timm` (for EMA-VFI and SGM-VFI)
 - `gdown` (for BIM-VFI/EMA-VFI/SGM-VFI model auto-download)
 - `omegaconf`, `easydict`, `yacs`, `einops` (for GIMM-VFI)
 - `huggingface_hub` (for GIMM-VFI model auto-download)
 ## VRAM Guide
 | VRAM | Recommended settings |
 |------|---------------------|
 | 8 GB | batch_size=1, chunk_size=500 |
 | 24 GB | batch_size=2-4, chunk_size=1000 |
 | 48 GB+ | batch_size=4-16, all_on_gpu=true |
 | 96 GB+ | batch_size=8-16, all_on_gpu=true, chunk_size=0 |
 ## Acknowledgments
-This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VFI) implementation by the [KAIST VIC Lab](https://github.com/KAIST-VICLab), the official [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) implementation by MCG-NJU, the official [SGM-VFI](https://github.com/MCG-NJU/SGM-VFI) implementation by MCG-NJU, and the [GIMM-VFI](https://github.com/GSeanCDAT/GIMM-VFI) implementation by S-Lab (NTU). GIMM-VFI architecture files in `gimm_vfi_arch/` are adapted from [kijai/ComfyUI-GIMM-VFI](https://github.com/kijai/ComfyUI-GIMM-VFI) with safetensors checkpoints from [Kijai/GIMM-VFI_safetensors](https://huggingface.co/Kijai/GIMM-VFI_safetensors). Architecture files in `bim_vfi_arch/`, `ema_vfi_arch/`, `sgm_vfi_arch/`, and `gimm_vfi_arch/` are vendored from their respective repositories with minimal modifications (relative imports, device-awareness fixes, inference-only paths).
+| Model | Authors | Venue | Links |
 |-------|---------|-------|-------|
 | **BIM-VFI** | Seo, Oh, Kim (KAIST VIC Lab) | CVPR 2025 | [Paper](https://arxiv.org/abs/2412.11365) · [Code](https://github.com/KAIST-VICLab/BiM-VFI) · [Project](https://kaist-viclab.github.io/BiM-VFI_site/) |
 | **EMA-VFI** | Zhang et al. (MCG-NJU) | CVPR 2023 | [Paper](https://arxiv.org/abs/2303.00440) · [Code](https://github.com/MCG-NJU/EMA-VFI) |
 | **SGM-VFI** | Zhang et al. (MCG-NJU) | CVPR 2024 | [Paper](https://arxiv.org/abs/2404.06913) · [Code](https://github.com/MCG-NJU/SGM-VFI) |
 | **GIMM-VFI** | Guo, Li, Loy (S-Lab NTU) | NeurIPS 2024 | [Paper](https://arxiv.org/abs/2407.08680) · [Code](https://github.com/GSeanCDAT/GIMM-VFI) |
-**BiM-VFI:**
+GIMM-VFI adaptation from [kijai/ComfyUI-GIMM-VFI](https://github.com/kijai/ComfyUI-GIMM-VFI) with checkpoints from [Kijai/GIMM-VFI_safetensors](https://huggingface.co/Kijai/GIMM-VFI_safetensors). Architecture files in `bim_vfi_arch/`, `ema_vfi_arch/`, `sgm_vfi_arch/`, and `gimm_vfi_arch/` are vendored from their respective repositories with minimal modifications.
-> Wonyong Seo, Jihyong Oh, and Munchurl Kim.
+
-> "BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions."
+<details>
-> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2025.
+<summary>BibTeX citations</summary>
 > [[arXiv]](https://arxiv.org/abs/2412.11365) [[Project Page]](https://kaist-viclab.github.io/BiM-VFI_site/) [[GitHub]](https://github.com/KAIST-VICLab/BiM-VFI)
 ```bibtex
@inproceedings{seo2025bimvfi,
@@ -196,45 +232,21 @@ This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VF
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
 }
 ```
 **EMA-VFI:**
 > Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, and Limin Wang.
 > "Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation."
 > *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2023.
 > [[arXiv]](https://arxiv.org/abs/2303.00440) [[GitHub]](https://github.com/MCG-NJU/EMA-VFI)
 ```bibtex
@inproceedings{zhang2023emavfi,
  title={Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation},
  author={Zhang, Guozhen and Zhu, Yuhan and Wang, Haonan and Chen, Youxin and Wu, Gangshan and Wang, Limin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
 }
 ```
 **SGM-VFI:**
 > Guozhen Zhang, Yuhan Zhu, Evan Zheran Liu, Haonan Wang, Mingzhen Sun, Gangshan Wu, and Limin Wang.
 > "Sparse Global Matching for Video Frame Interpolation with Large Motion."
 > *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2024.
 > [[arXiv]](https://arxiv.org/abs/2404.06913) [[GitHub]](https://github.com/MCG-NJU/SGM-VFI)
 ```bibtex
@inproceedings{zhang2024sgmvfi,
  title={Sparse Global Matching for Video Frame Interpolation with Large Motion},
  author={Zhang, Guozhen and Zhu, Yuhan and Liu, Evan Zheran and Wang, Haonan and Sun, Mingzhen and Wu, Gangshan and Wang, Limin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
 }
 ```
 **GIMM-VFI:**
 > Zujin Guo, Wei Li, and Chen Change Loy.
 > "Generalizable Implicit Motion Modeling for Video Frame Interpolation."
 > *Advances in Neural Information Processing Systems (NeurIPS)*, 2024.
 > [[arXiv]](https://arxiv.org/abs/2407.08680) [[GitHub]](https://github.com/GSeanCDAT/GIMM-VFI)
 ```bibtex
@inproceedings{guo2024gimmvfi,
  title={Generalizable Implicit Motion Modeling for Video Frame Interpolation},
  author={Guo, Zujin and Li, Wei and Loy, Chen Change},
@@ -243,12 +255,12 @@ This project wraps the official [BiM-VFI](https://github.com/KAIST-VICLab/BiM-VF
 }
 ```
 </details>
 ## License
-The BiM-VFI model weights and architecture code are provided by KAIST VIC Lab for **research and education purposes only**. Commercial use requires permission from the principal investigator (Prof. Munchurl Kim, mkimee@kaist.ac.kr). See the [original repository](https://github.com/KAIST-VICLab/BiM-VFI) for details.
+**BIM-VFI:** Research and education only. Commercial use requires permission from Prof. Munchurl Kim (mkimee@kaist.ac.kr). See the [original repository](https://github.com/KAIST-VICLab/BiM-VFI).
-The EMA-VFI model weights and architecture code are released under the [Apache 2.0 License](https://github.com/MCG-NJU/EMA-VFI/blob/main/LICENSE). See the [original repository](https://github.com/MCG-NJU/EMA-VFI) for details.
+**EMA-VFI, SGM-VFI, GIMM-VFI:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). GIMM-VFI ComfyUI adaptation based on [kijai/ComfyUI-GIMM-VFI](https://github.com/kijai/ComfyUI-GIMM-VFI).
-The SGM-VFI model weights and architecture code are released under the [Apache 2.0 License](https://github.com/MCG-NJU/SGM-VFI/blob/main/LICENSE). See the [original repository](https://github.com/MCG-NJU/SGM-VFI) for details.
+**This wrapper code:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 The GIMM-VFI model weights and architecture code are released under the [Apache 2.0 License](https://github.com/GSeanCDAT/GIMM-VFI/blob/main/LICENSE). See the [original repository](https://github.com/GSeanCDAT/GIMM-VFI) for details. ComfyUI adaptation based on [kijai/ComfyUI-GIMM-VFI](https://github.com/kijai/ComfyUI-GIMM-VFI).
@@ -1,59 +1,11 @@
 import subprocess
 import sys
 import logging
 logger = logging.getLogger("Tween")
 def _auto_install_deps():
    """Auto-install missing dependencies on first load."""
    # gdown
    try:
        import gdown  # noqa: F401
    except ImportError:
        logger.info("[Tween] Installing gdown...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "gdown"])
    # timm (required for EMA-VFI's MotionFormer backbone)
    try:
        import timm  # noqa: F401
    except ImportError:
        logger.info("[Tween] Installing timm...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "timm"])
    # cupy
    try:
        import cupy  # noqa: F401
    except ImportError:
        try:
            import torch
            major = int(torch.version.cuda.split(".")[0])
            cupy_pkg = f"cupy-cuda{major}x"
            logger.info(f"[Tween] Installing {cupy_pkg} (CUDA {torch.version.cuda})...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", cupy_pkg])
        except Exception as e:
            logger.warning(f"[Tween] Could not auto-install cupy: {e}")
    # GIMM-VFI dependencies
    for pkg in ("omegaconf", "yacs", "easydict", "einops", "huggingface_hub"):
        try:
            __import__(pkg)
        except ImportError:
            logger.info(f"[Tween] Installing {pkg}...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", pkg])
 _auto_install_deps()
 from .nodes import (
    LoadBIMVFIModel, BIMVFIInterpolate, BIMVFISegmentInterpolate, TweenConcatVideos,
    LoadEMAVFIModel, EMAVFIInterpolate, EMAVFISegmentInterpolate,
    LoadSGMVFIModel, SGMVFIInterpolate, SGMVFISegmentInterpolate,
    LoadGIMMVFIModel, GIMMVFIInterpolate, GIMMVFISegmentInterpolate,
    VFIOptimizer,
 )
 WEB_DIRECTORY = "./web"
 NODE_CLASS_MAPPINGS = {
    "LoadBIMVFIModel": LoadBIMVFIModel,
    "BIMVFIInterpolate": BIMVFIInterpolate,
@@ -68,6 +20,7 @@ NODE_CLASS_MAPPINGS = {
    "LoadGIMMVFIModel": LoadGIMMVFIModel,
    "GIMMVFIInterpolate": GIMMVFIInterpolate,
    "GIMMVFISegmentInterpolate": GIMMVFISegmentInterpolate,
    "VFIOptimizer": VFIOptimizer,
 }
 NODE_DISPLAY_NAME_MAPPINGS = {
@@ -84,4 +37,5 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    "LoadGIMMVFIModel": "Load GIMM-VFI Model",
    "GIMMVFIInterpolate": "GIMM-VFI Interpolate",
    "GIMMVFISegmentInterpolate": "GIMM-VFI Segment Interpolate",
    "VFIOptimizer": "VFI Optimizer",
 }
@@ -0,0 +1,84 @@
 <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 320" width="720" height="320">
  <defs>
    <linearGradient id="gQ" x1="0" y1="0" x2="1" y2="0">
      <stop offset="0%" stop-color="#7aa2f7"/><stop offset="100%" stop-color="#7dcfff"/>
    </linearGradient>
    <linearGradient id="gS" x1="0" y1="0" x2="1" y2="0">
      <stop offset="0%" stop-color="#9ece6a"/><stop offset="100%" stop-color="#73daca"/>
    </linearGradient>
    <linearGradient id="gV" x1="0" y1="0" x2="1" y2="0">
      <stop offset="0%" stop-color="#bb9af7"/><stop offset="100%" stop-color="#d2a8ff"/>
    </linearGradient>
  </defs>
  <!-- Background -->
  <rect width="720" height="320" rx="16" fill="#0d1117"/>
  <!-- ═══ BIM-VFI (top-left) ═══ -->
  <rect x="10" y="10" width="340" height="145" rx="10" fill="#161b22" stroke="#30363d" stroke-width="1"/>
  <rect x="11" y="22" width="3" height="121" fill="#3fb950"/>
  <text x="30" y="38" fill="#e6edf3" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="15" font-weight="600">BIM-VFI</text>
  <text x="30" y="56" fill="#3fb950" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">&#9733; Recommended &#183; Best quality &#183; CVPR 2025</text>
  <line x1="30" y1="64" x2="330" y2="64" stroke="#30363d" stroke-width="0.5"/>
  <text x="30" y="82" fill="#7aa2f7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Quality</text>
  <rect x="88" y="72" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="88" y="72" width="244" height="11" rx="3" fill="url(#gQ)" opacity="0.85"/>
  <text x="30" y="100" fill="#9ece6a" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Speed</text>
  <rect x="88" y="90" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="88" y="90" width="146" height="11" rx="3" fill="url(#gS)" opacity="0.85"/>
  <text x="30" y="118" fill="#bb9af7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">VRAM</text>
  <rect x="88" y="108" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="88" y="108" width="195" height="11" rx="3" fill="url(#gV)" opacity="0.85"/>
  <text x="262" y="143" fill="#f0883e" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="10">Research only</text>
  <!-- ═══ EMA-VFI (top-right) ═══ -->
  <rect x="370" y="10" width="340" height="145" rx="10" fill="#161b22" stroke="#30363d" stroke-width="1"/>
  <rect x="371" y="22" width="3" height="121" fill="#58a6ff"/>
  <text x="390" y="38" fill="#e6edf3" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="15" font-weight="600">EMA-VFI</text>
  <text x="390" y="56" fill="#58a6ff" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Fastest &#183; No cupy needed &#183; CVPR 2023</text>
  <line x1="390" y1="64" x2="690" y2="64" stroke="#30363d" stroke-width="0.5"/>
  <text x="390" y="82" fill="#7aa2f7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Quality</text>
  <rect x="448" y="72" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="448" y="72" width="146" height="11" rx="3" fill="url(#gQ)" opacity="0.85"/>
  <text x="390" y="100" fill="#9ece6a" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Speed</text>
  <rect x="448" y="90" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="448" y="90" width="244" height="11" rx="3" fill="url(#gS)" opacity="0.85"/>
  <text x="390" y="118" fill="#bb9af7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">VRAM</text>
  <rect x="448" y="108" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="448" y="108" width="244" height="11" rx="3" fill="url(#gV)" opacity="0.85"/>
  <text x="632" y="143" fill="#8b949e" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="10">Apache 2.0</text>
  <!-- ═══ SGM-VFI (bottom-left) ═══ -->
  <rect x="10" y="165" width="340" height="145" rx="10" fill="#161b22" stroke="#30363d" stroke-width="1"/>
  <rect x="11" y="177" width="3" height="121" fill="#f0883e"/>
  <text x="30" y="193" fill="#e6edf3" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="15" font-weight="600">SGM-VFI</text>
  <text x="30" y="211" fill="#f0883e" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Large motion specialist &#183; CVPR 2024</text>
  <line x1="30" y1="219" x2="330" y2="219" stroke="#30363d" stroke-width="0.5"/>
  <text x="30" y="237" fill="#7aa2f7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Quality</text>
  <rect x="88" y="227" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="88" y="227" width="195" height="11" rx="3" fill="url(#gQ)" opacity="0.85"/>
  <text x="30" y="255" fill="#9ece6a" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Speed</text>
  <rect x="88" y="245" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="88" y="245" width="98" height="11" rx="3" fill="url(#gS)" opacity="0.85"/>
  <text x="30" y="273" fill="#bb9af7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">VRAM</text>
  <rect x="88" y="263" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="88" y="263" width="98" height="11" rx="3" fill="url(#gV)" opacity="0.85"/>
  <text x="272" y="298" fill="#8b949e" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="10">Apache 2.0</text>
  <!-- ═══ GIMM-VFI (bottom-right) ═══ -->
  <rect x="370" y="165" width="340" height="145" rx="10" fill="#161b22" stroke="#30363d" stroke-width="1"/>
  <rect x="371" y="177" width="3" height="121" fill="#bc8cff"/>
  <text x="390" y="193" fill="#e6edf3" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="15" font-weight="600">GIMM-VFI</text>
  <text x="390" y="211" fill="#bc8cff" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Single-pass 4&#215;/8&#215; &#183; NeurIPS 2024</text>
  <line x1="390" y1="219" x2="690" y2="219" stroke="#30363d" stroke-width="0.5"/>
  <text x="390" y="237" fill="#7aa2f7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Quality</text>
  <rect x="448" y="227" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="448" y="227" width="146" height="11" rx="3" fill="url(#gQ)" opacity="0.85"/>
  <text x="390" y="255" fill="#9ece6a" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">Speed</text>
  <rect x="448" y="245" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="448" y="245" width="195" height="11" rx="3" fill="url(#gS)" opacity="0.85"/>
  <text x="390" y="273" fill="#bb9af7" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="11">VRAM</text>
  <rect x="448" y="263" width="244" height="11" rx="3" fill="#21262d"/>
  <rect x="448" y="263" width="146" height="11" rx="3" fill="url(#gV)" opacity="0.85"/>
  <text x="632" y="298" fill="#8b949e" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI','Noto Sans',Helvetica,Arial,sans-serif" font-size="10">Apache 2.0</text>
 </svg>
@@ -4,6 +4,7 @@ import collections
 import os
 import re
 import torch
 import torch.nn.functional as F
 import typing
 cupy = None
@@ -14,12 +15,11 @@ def _ensure_cupy():
        try:
            import cupy as _cupy
            cupy = _cupy
-        except ImportError:
+        except Exception:
-            raise RuntimeError(
+            # Broad catch: an installed-but-broken cupy (e.g. incompatible
-                "cupy is required for BIM-VFI. Install it with:\n"
+            # NumPy) raises non-ImportError exceptions at import time. Treat any
-                "  pip install cupy-cuda12x  (or cupy-cuda11x for CUDA 11)\n"
+            # failure as "cupy unavailable"; the PyTorch fallback will be used.
-                "Or run install.py from the ComfyUI-Tween directory."
+            pass
            )
 ##########################################################
@@ -246,6 +246,44 @@ def cuda_launch(strKey:str):
 # end
 def _pytorch_costvol_impl(tenOne, tenTwo, intKernelSize):
    """Pure-PyTorch local cost volume via unfold + dot product."""
    B, C, H, W = tenOne.shape
    pad = (intKernelSize - 1) // 2
    # Pad tenTwo so out-of-bounds yields 0 (matches CUDA kernel)
    tenTwo_padded = F.pad(tenTwo, [pad, pad, pad, pad])
    # Unfold into patches: (B, C, H, W, K, K)
    patches = tenTwo_padded.unfold(2, intKernelSize, 1).unfold(3, intKernelSize, 1)
    # Reshape to (B, C, H, W, K*K)
    patches = patches.contiguous().view(B, C, H, W, intKernelSize * intKernelSize)
    # Dot product over C dimension: (B, H, W, K*K)
    tenOut = (tenOne.unsqueeze(-1) * patches).sum(dim=1)
    # Permute to (B, K*K, H, W) to match CUDA output layout
    tenOut = tenOut.permute(0, 3, 1, 2).contiguous()
    return tenOut
 _costvol_fn = None
 def _pytorch_costvol(tenOne, tenTwo, intKernelSize):
    global _costvol_fn
    if _costvol_fn is None:
        try:
            _costvol_fn = torch.compile(_pytorch_costvol_impl)
        except Exception:
            _costvol_fn = _pytorch_costvol_impl
    try:
        return _costvol_fn(tenOne, tenTwo, intKernelSize)
    except Exception:
        _costvol_fn = _pytorch_costvol_impl
        return _costvol_fn(tenOne, tenTwo, intKernelSize)
 ##########################################################
@@ -253,55 +291,59 @@ class costvol_func(torch.autograd.Function):
    @staticmethod
    @torch.amp.custom_fwd(device_type='cuda', cast_inputs=torch.float32)
    def forward(self, tenOne, tenTwo, intKernelSize):
-        tenOut = tenOne.new_empty([tenOne.shape[0], intKernelSize ** 2, tenOne.shape[2], tenOne.shape[3]])
+        _ensure_cupy()
        if tenOne.is_cuda and cupy is not None:
            tenOut = tenOne.new_empty([tenOne.shape[0], intKernelSize ** 2, tenOne.shape[2], tenOne.shape[3]])
-        cuda_launch(cuda_kernel('costvol_out', '''
+            cuda_launch(cuda_kernel('costvol_out', '''
-            extern "C" __global__ void __launch_bounds__(512) costvol_out(
+                extern "C" __global__ void __launch_bounds__(512) costvol_out(
-                const int n,
+                    const int n,
-                const {{type}}* __restrict__ tenOne,
+                    const {{type}}* __restrict__ tenOne,
-                const {{type}}* __restrict__ tenTwo,
+                    const {{type}}* __restrict__ tenTwo,
-                const int intKernelSize,
+                    const int intKernelSize,
-                {{type}}* __restrict__ tenOut
+                    {{type}}* __restrict__ tenOut
-            ) { for (int intIndex = (blockIdx.x * blockDim.x) + threadIdx.x; intIndex < n; intIndex += blockDim.x * gridDim.x) {
+                ) { for (int intIndex = (blockIdx.x * blockDim.x) + threadIdx.x; intIndex < n; intIndex += blockDim.x * gridDim.x) {
-                const int intN = ( intIndex / SIZE_3(tenOut) / SIZE_2(tenOut) ) % SIZE_0(tenOut);
+                    const int intN = ( intIndex / SIZE_3(tenOut) / SIZE_2(tenOut) ) % SIZE_0(tenOut);
-                const int intY = ( intIndex / SIZE_3(tenOut)                  ) % SIZE_2(tenOut);
+                    const int intY = ( intIndex / SIZE_3(tenOut)                  ) % SIZE_2(tenOut);
-                const int intX = ( intIndex                                   ) % SIZE_3(tenOut);
+                    const int intX = ( intIndex                                   ) % SIZE_3(tenOut);
-                {{type}} fltOne[{{intChans}}];
+                    {{type}} fltOne[{{intChans}}];
-                for (int intValue = 0; intValue < SIZE_1(tenOne); intValue += 1) {
+                    for (int intValue = 0; intValue < SIZE_1(tenOne); intValue += 1) {
-                    fltOne[intValue] = VALUE_4(tenOne, intN, intValue, intY, intX);
+                        fltOne[intValue] = VALUE_4(tenOne, intN, intValue, intY, intX);
                }
                int intOffset = OFFSET_4(tenOut, intN, 0, intY, intX);
                for (int intOy = intY - (intKernelSize - 1) / 2; intOy <= intY + (intKernelSize - 1) / 2; intOy += 1) {
                    for (int intOx = intX - (intKernelSize - 1) / 2; intOx <= intX + (intKernelSize - 1) / 2; intOx += 1) {
                        {{type}} fltValue = 0.0f;
                        if ((intOy >= 0) && (intOy < SIZE_2(tenOut)) && (intOx >= 0) && (intOx < SIZE_3(tenOut))) {
                            for (int intValue = 0; intValue < SIZE_1(tenOne); intValue += 1) {
                                fltValue += (fltOne[intValue] * VALUE_4(tenTwo, intN, intValue, intOy, intOx));
                            }
                        }
                        tenOut[intOffset] = fltValue;
                        intOffset += SIZE_2(tenOut) * SIZE_3(tenOut);
                    }
-                }
+
-            } }
+                    int intOffset = OFFSET_4(tenOut, intN, 0, intY, intX);
-        ''', {
+
-            'intChans': tenOne.shape[1],
+                    for (int intOy = intY - (intKernelSize - 1) / 2; intOy <= intY + (intKernelSize - 1) / 2; intOy += 1) {
-            'tenOne': tenOne,
+                        for (int intOx = intX - (intKernelSize - 1) / 2; intOx <= intX + (intKernelSize - 1) / 2; intOx += 1) {
-            'tenTwo': tenTwo,
+                            {{type}} fltValue = 0.0f;
-            'intKernelSize': intKernelSize,
+
-            'tenOut': tenOut
+                            if ((intOy >= 0) && (intOy < SIZE_2(tenOut)) && (intOx >= 0) && (intOx < SIZE_3(tenOut))) {
-        }))(
+                                for (int intValue = 0; intValue < SIZE_1(tenOne); intValue += 1) {
-            grid=tuple([int(((tenOut.shape[0] * tenOut.shape[2] * tenOut.shape[3]) + 512 - 1) / 512), 1, 1]),
+                                    fltValue += (fltOne[intValue] * VALUE_4(tenTwo, intN, intValue, intOy, intOx));
-            block=tuple([512, 1, 1]),
+                                }
-            args=[cuda_int32(tenOut.shape[0] * tenOut.shape[2] * tenOut.shape[3]), tenOne.data_ptr(), tenTwo.data_ptr(), intKernelSize, tenOut.data_ptr()],
+                            }
-            stream=collections.namedtuple('Stream', 'ptr')(torch.cuda.current_stream().cuda_stream)
+
-        )
+                            tenOut[intOffset] = fltValue;
                            intOffset += SIZE_2(tenOut) * SIZE_3(tenOut);
                        }
                    }
                } }
            ''', {
                'intChans': tenOne.shape[1],
                'tenOne': tenOne,
                'tenTwo': tenTwo,
                'intKernelSize': intKernelSize,
                'tenOut': tenOut
            }))(
                grid=tuple([int(((tenOut.shape[0] * tenOut.shape[2] * tenOut.shape[3]) + 512 - 1) / 512), 1, 1]),
                block=tuple([512, 1, 1]),
                args=[cuda_int32(tenOut.shape[0] * tenOut.shape[2] * tenOut.shape[3]), tenOne.data_ptr(), tenTwo.data_ptr(), intKernelSize, tenOut.data_ptr()],
                stream=collections.namedtuple('Stream', 'ptr')(torch.cuda.current_stream().cuda_stream)
            )
        else:
            tenOut = _pytorch_costvol(tenOne, tenTwo, intKernelSize)
        self.save_for_backward(tenOne, tenTwo)
        self.intKernelSize = intKernelSize
@@ -0,0 +1,297 @@
 # Pure-PyTorch Fallbacks for cupy Kernels
 > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
 **Goal:** Make BIM-VFI, SGM-VFI, and GIMM-VFI work without cupy by adding pure-PyTorch fallback implementations of softsplat and costvol.
 **Architecture:** Each kernel file (`sgm_vfi_arch/softsplat.py`, `gimm_vfi_arch/.../softsplat.py`, `bim_vfi_arch/costvol.py`) gets a `_pytorch_*` fallback function. The `softsplat_func.forward()` and `costvol_func.forward()` methods dispatch to cupy when available, otherwise use the fallback. The `_check_cupy()` gate in `nodes.py` is removed so models can load on any backend.
 **Tech Stack:** PyTorch (`scatter_add_`, `F.unfold`, `F.pad`)
 ---
 ### Task 1: Add pure-PyTorch softsplat fallback to SGM-VFI
 **Files:**
 - Modify: `sgm_vfi_arch/softsplat.py`
 **Step 1: Add cupy availability flag and fallback function**
 At the top of `sgm_vfi_arch/softsplat.py`, change the hard `import cupy` to a try/except, and add the fallback function after the `cuda_launch` function (before the `softsplat()` function).
 Replace:
 ```python
 import cupy
 ```
 With:
 ```python
 try:
    import cupy
 except ImportError:
    cupy = None
 ```
 Add this fallback function (after `cuda_launch`, before `softsplat`):
 ```python
 def _pytorch_softsplat(tenIn, tenFlow):
    B, C, H, W = tenIn.shape
    tenOut = tenIn.new_zeros(B, C, H, W)
    # Build base grid: (x, y) for each pixel
    grid_y, grid_x = torch.meshgrid(
        torch.arange(H, device=tenIn.device, dtype=tenIn.dtype),
        torch.arange(W, device=tenIn.device, dtype=tenIn.dtype),
        indexing='ij',
    )
    # Target positions
    flt_x = grid_x.unsqueeze(0) + tenFlow[:, 0, :, :]  # (B, H, W)
    flt_y = grid_y.unsqueeze(0) + tenFlow[:, 1, :, :]
    # Filter non-finite
    valid = torch.isfinite(flt_x) & torch.isfinite(flt_y)
    flt_x = torch.where(valid, flt_x, torch.zeros_like(flt_x))
    flt_y = torch.where(valid, flt_y, torch.zeros_like(flt_y))
    # Four neighbors (NW, NE, SW, SE)
    nw_x = flt_x.floor().long()
    nw_y = flt_y.floor().long()
    # Bilinear weights
    frac_x = flt_x - nw_x.float()
    frac_y = flt_y - nw_y.float()
    w_nw = (1.0 - frac_x) * (1.0 - frac_y)
    w_ne = frac_x * (1.0 - frac_y)
    w_sw = (1.0 - frac_x) * frac_y
    w_se = frac_x * frac_y
    # Zero out invalid pixels
    w_nw = w_nw * valid
    w_ne = w_ne * valid
    w_sw = w_sw * valid
    w_se = w_se * valid
    # For each of the 4 neighbors, scatter into output
    for dx, dy, w in [(0, 0, w_nw), (1, 0, w_ne), (0, 1, w_sw), (1, 1, w_se)]:
        tx = nw_x + dx
        ty = nw_y + dy
        in_bounds = (tx >= 0) & (tx < W) & (ty >= 0) & (ty < H)
        w_masked = w * in_bounds
        # Flatten to 1D index for scatter_add
        idx = (ty.clamp(0, H - 1) * W + tx.clamp(0, W - 1))  # (B, H, W)
        idx = idx.unsqueeze(1).expand_as(tenIn)  # (B, C, H, W)
        weighted = tenIn * w_masked.unsqueeze(1)  # (B, C, H, W)
        tenOut.view(B, C, -1).scatter_add_(2, idx.reshape(B, C, -1), weighted.reshape(B, C, -1))
    return tenOut
 ```
 **Step 2: Update softsplat_func.forward to use fallback**
 In `softsplat_func.forward()`, replace the `elif tenIn.is_cuda != True: assert(False)` block so it dispatches to the fallback when cupy is unavailable or when not on CUDA:
 ```python
 # Current:
        if tenIn.is_cuda == True:
            cuda_launch(cuda_kernel(...))(...) 
        elif tenIn.is_cuda != True:
            assert(False)
 # New:
        if tenIn.is_cuda and cupy is not None:
            cuda_launch(cuda_kernel(...))(...) 
        else:
            tenOut = _pytorch_softsplat(tenIn, tenFlow)
 ```
 Also guard the `@cupy.memoize` decorator on `cuda_launch`:
 ```python
 # Current:
@cupy.memoize(for_each_device=True)
 def cuda_launch(strKey:str):
 # New:
 def cuda_launch(strKey:str):
 ```
 (The function already has its own dict-based caching via `objCudacache`, and the memoize is redundant anyway. But the real issue is it crashes at import when cupy=None.)
 Wait - actually `cuda_launch` uses `cupy.RawKernel` inside, so it's only ever called on the cupy path. The `@cupy.memoize` decorator is the problem: it runs at import time. Replace it:
 ```python
 # Replace @cupy.memoize(for_each_device=True) with a simple cache dict
 _cuda_launch_cache = {}
 def cuda_launch(strKey:str):
    if strKey not in _cuda_launch_cache:
        if 'CUDA_HOME' not in os.environ:
            os.environ['CUDA_HOME'] = cupy.cuda.get_cuda_path()
        _cuda_launch_cache[strKey] = cupy.RawKernel(
            objCudacache[strKey]['strKernel'],
            objCudacache[strKey]['strFunction'],
            options=tuple(['-I ' + os.environ['CUDA_HOME'],
                           '-I ' + os.environ['CUDA_HOME'] + '/include'])
        )
    return _cuda_launch_cache[strKey]
 ```
 **Step 3: Commit**
 ```bash
 git add sgm_vfi_arch/softsplat.py
 git commit -m "feat: add pure-PyTorch softsplat fallback for SGM-VFI"
 ```
 ---
 ### Task 2: Add pure-PyTorch softsplat fallback to GIMM-VFI
 **Files:**
 - Modify: `gimm_vfi_arch/generalizable_INR/modules/softsplat.py`
 **Step 1: Add cupy availability flag and fallback function**
 Same pattern as Task 1. Replace `import cupy` with try/except. Add the same `_pytorch_softsplat()` function. Replace `@cupy.memoize(for_each_device=True)` on `cuda_launch` with a dict cache.
 The GIMM softsplat.py already has `@torch.compiler.disable()` on `cuda_launch` — keep that decorator.
 **Step 2: Update softsplat_func.forward dispatch**
 Same pattern: if `tenIn.is_cuda and cupy is not None` → cupy path, else → `_pytorch_softsplat`.
 **Step 3: Commit**
 ```bash
 git add gimm_vfi_arch/generalizable_INR/modules/softsplat.py
 git commit -m "feat: add pure-PyTorch softsplat fallback for GIMM-VFI"
 ```
 ---
 ### Task 3: Add pure-PyTorch costvol fallback to BIM-VFI
 **Files:**
 - Modify: `bim_vfi_arch/costvol.py`
 **Step 1: Add the fallback function**
 After the existing `cuda_launch` function, add:
 ```python
 def _pytorch_costvol(tenOne, tenTwo, intKernelSize):
    B, C, H, W = tenOne.shape
    pad = (intKernelSize - 1) // 2
    # Pad tenTwo with zeros so out-of-bounds accesses yield 0 (matches CUDA kernel)
    tenTwo_padded = F.pad(tenTwo, [pad, pad, pad, pad])
    # Unfold into (B, C, K*K, H, W) patches
    patches = tenTwo_padded.unfold(2, intKernelSize, 1).unfold(3, intKernelSize, 1)
    # patches shape: (B, C, H, W, K, K)
    patches = patches.contiguous().view(B, C, H, W, intKernelSize * intKernelSize)
    # -> (B, C, H, W, K^2)
    # Dot product: sum over C
    # tenOne: (B, C, H, W) -> (B, C, H, W, 1)
    tenOut = (tenOne.unsqueeze(-1) * patches).sum(dim=1)
    # tenOut: (B, H, W, K^2)
    # Permute to (B, K^2, H, W) to match CUDA output layout
    tenOut = tenOut.permute(0, 3, 1, 2).contiguous()
    return tenOut
 ```
 Add `import torch.nn.functional as F` at the top if not already present.
 **Step 2: Update costvol_func.forward dispatch**
 The current forward unconditionally calls `cuda_launch(cuda_kernel(...))`. Change to:
 ```python
@staticmethod
@torch.amp.custom_fwd(device_type='cuda', cast_inputs=torch.float32)
 def forward(self, tenOne, tenTwo, intKernelSize):
    if tenOne.is_cuda and cupy is not None:
        # existing cupy code (unchanged)
        tenOut = tenOne.new_empty([tenOne.shape[0], intKernelSize ** 2, tenOne.shape[2], tenOne.shape[3]])
        cuda_launch(cuda_kernel(...))(...) 
    else:
        tenOut = _pytorch_costvol(tenOne, tenTwo, intKernelSize)
    self.save_for_backward(tenOne, tenTwo)
    self.intKernelSize = intKernelSize
    return tenOut
 ```
 **Step 3: Commit**
 ```bash
 git add bim_vfi_arch/costvol.py
 git commit -m "feat: add pure-PyTorch costvol fallback for BIM-VFI"
 ```
 ---
 ### Task 4: Remove _check_cupy gate from nodes.py
 **Files:**
 - Modify: `nodes.py`
 **Step 1: Remove the _check_cupy function and all its call sites**
 Delete the `_check_cupy()` function definition (lines 22-41). Remove the three calls:
 - Line 209: `_check_cupy("BIM-VFI")` (in BIM-VFI load)
 - Line 1377: `_check_cupy("SGM-VFI")` (in SGM-VFI load)
 - Line 1804: `_check_cupy("GIMM-VFI")` (in GIMM-VFI load)
 **Step 2: Commit**
 ```bash
 git add nodes.py
 git commit -m "feat: remove cupy requirement gate, models now fallback to pure PyTorch"
 ```
 ---
 ### Task 5: Make install.py not force cupy installation
 **Files:**
 - Modify: `install.py`
 **Step 1: Change cupy from required to optional**
 Make cupy a soft dependency — try to install it but don't fail if it can't be installed (ROCm users, no CUDA toolkit, etc.). Change `install()`:
 ```python
 def install():
    # Install core requirements first
    requirements_path = os.path.join(os.path.dirname(__file__), "requirements.txt")
    subprocess.check_call([
        sys.executable, "-m", "pip", "install", "-r", requirements_path
    ])
    # Try to install cupy for NVIDIA users (optional, improves performance)
    cupy_pkg = get_cupy_package()
    if cupy_pkg:
        try:
            subprocess.check_call([
                sys.executable, "-m", "pip", "install", cupy_pkg
            ])
            print(f"[Tween] cupy installed successfully ({cupy_pkg})")
        except subprocess.CalledProcessError:
            print(f"[Tween] WARNING: Could not install {cupy_pkg}. "
                  f"BIM-VFI, SGM-VFI, and GIMM-VFI will use slower PyTorch fallback.")
    else:
        print("[Tween] cupy not available (no NVIDIA CUDA). "
              "BIM-VFI, SGM-VFI, and GIMM-VFI will use PyTorch fallback.")
 ```
 Also stop writing cupy into `requirements.txt` — remove the `update_requirements` call and function.
 **Step 2: Commit**
 ```bash
 git add install.py
 git commit -m "feat: make cupy optional in install.py"
 ```
@@ -206,100 +206,6 @@
        "2"
      ]
    },
    {
      "id": 12,
      "type": "easy forLoopStart",
      "pos": [
        -8160,
        576
      ],
      "size": [
        270,
        138
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "initial_value1",
          "shape": 7,
          "type": "*",
          "link": 68
        },
        {
          "name": "total",
          "type": "INT",
          "widget": {
            "name": "total"
          },
          "link": 33
        },
        {
          "name": "initial_value2",
          "type": "*",
          "link": 44
        },
        {
          "name": "initial_value3",
          "type": "*",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "flow",
          "shape": 5,
          "type": "FLOW_CONTROL",
          "links": [
            15
          ]
        },
        {
          "name": "index",
          "type": "INT",
          "links": [
            25,
            26
          ]
        },
        {
          "name": "value1",
          "type": "*",
          "links": [
            18
          ]
        },
        {
          "name": "value2",
          "type": "*",
          "links": [
            21,
            64
          ]
        },
        {
          "name": "value3",
          "type": "*",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-easy-use",
        "ver": "7c470c67d6df44498e52c902173c1ac77cd5bdfd",
        "Node name for S&R": "easy forLoopStart",
        "ue_properties": {
          "widget_ue_connectable": {},
          "input_ue_unconnectable": {},
          "version": "7.6.2"
        }
      },
      "widgets_values": [
        6
      ],
      "color": "#223",
      "bgcolor": "#335"
    },
    {
      "id": 13,
      "type": "easy forLoopEnd",
@@ -371,85 +277,6 @@
      "color": "#223",
      "bgcolor": "#335"
    },
    {
      "id": 11,
      "type": "BIMVFISegmentInterpolate",
      "pos": [
        -7584,
        576
      ],
      "size": [
        321.58209228515625,
        246
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 21
        },
        {
          "name": "model",
          "type": "BIM_VFI_MODEL",
          "link": 18
        },
        {
          "name": "segment_index",
          "type": "INT",
          "widget": {
            "name": "segment_index"
          },
          "link": 25
        },
        {
          "name": "segment_size",
          "type": "INT",
          "widget": {
            "name": "segment_size"
          },
          "link": 35
        }
      ],
      "outputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "links": [
            66
          ]
        },
        {
          "name": "model",
          "type": "BIM_VFI_MODEL",
          "links": [
            67
          ]
        }
      ],
      "properties": {
        "aux_id": "Comfyui-BIM-VFI.git",
        "ver": "7cf7162143eaa5b0939e0e122f80bc956baf65ea",
        "Node name for S&R": "BIMVFISegmentInterpolate",
        "ue_properties": {
          "widget_ue_connectable": {},
          "input_ue_unconnectable": {},
          "version": "7.6.2"
        }
      },
      "widgets_values": [
        2,
        40,
        true,
        true,
        1,
        0,
        0,
        500
      ]
    },
    {
      "id": 3,
      "type": "LoadBIMVFIModel",
@@ -561,7 +388,6 @@
        "video/",
        "tween_sgm",
        "tween_video_sgm.mp4",
        true,
        true
      ]
    },
@@ -574,7 +400,7 @@
      ],
      "size": [
        544,
-        352
+        334
      ],
      "flags": {},
      "order": 10,
@@ -647,11 +473,227 @@
        }
      }
    },
    {
      "id": 16,
      "type": "PrimitiveInt",
      "pos": [
        -9184,
        544
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "INT",
          "type": "INT",
          "links": [
            31,
            35
          ]
        }
      ],
      "title": "Frames number each loops",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.13.0",
        "Node name for S&R": "PrimitiveInt",
        "ue_properties": {
          "widget_ue_connectable": {},
          "input_ue_unconnectable": {},
          "version": "7.6.2"
        }
      },
      "widgets_values": [
        100,
        "fixed"
      ]
    },
    {
      "id": 12,
      "type": "easy forLoopStart",
      "pos": [
        -8160,
        576
      ],
      "size": [
        270,
        138
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "initial_value1",
          "shape": 7,
          "type": "*",
          "link": 68
        },
        {
          "name": "total",
          "type": "INT",
          "widget": {
            "name": "total"
          },
          "link": 33
        },
        {
          "name": "initial_value2",
          "type": "*",
          "link": 44
        },
        {
          "name": "initial_value3",
          "type": "*",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "flow",
          "shape": 5,
          "type": "FLOW_CONTROL",
          "links": [
            15
          ]
        },
        {
          "name": "index",
          "type": "INT",
          "links": [
            25,
            26
          ]
        },
        {
          "name": "value1",
          "type": "*",
          "links": [
            18
          ]
        },
        {
          "name": "value2",
          "type": "*",
          "links": [
            21,
            64
          ]
        },
        {
          "name": "value3",
          "type": "*",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-easy-use",
        "ver": "7c470c67d6df44498e52c902173c1ac77cd5bdfd",
        "Node name for S&R": "easy forLoopStart",
        "ue_properties": {
          "widget_ue_connectable": {},
          "input_ue_unconnectable": {},
          "version": "7.6.2"
        }
      },
      "widgets_values": [
        6
      ],
      "color": "#223",
      "bgcolor": "#335"
    },
    {
      "id": 11,
      "type": "BIMVFISegmentInterpolate",
      "pos": [
        -7584,
        576
      ],
      "size": [
        321.58209228515625,
        294
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 21
        },
        {
          "name": "model",
          "type": "BIM_VFI_MODEL",
          "link": 18
        },
        {
          "name": "segment_index",
          "type": "INT",
          "widget": {
            "name": "segment_index"
          },
          "link": 25
        },
        {
          "name": "segment_size",
          "type": "INT",
          "widget": {
            "name": "segment_size"
          },
          "link": 35
        }
      ],
      "outputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "links": [
            66
          ]
        },
        {
          "name": "model",
          "type": "BIM_VFI_MODEL",
          "links": [
            67
          ]
        }
      ],
      "properties": {
        "aux_id": "Comfyui-BIM-VFI.git",
        "ver": "7cf7162143eaa5b0939e0e122f80bc956baf65ea",
        "Node name for S&R": "BIMVFISegmentInterpolate",
        "ue_properties": {
          "widget_ue_connectable": {},
          "input_ue_unconnectable": {},
          "version": "7.6.2"
        }
      },
      "widgets_values": [
        2,
        40,
        true,
        true,
        1,
        500,
        16,
        0,
        0,
        500
      ]
    },
    {
      "id": 28,
      "type": "VHS_LoadVideoPath",
      "pos": [
-        -9152,
+        -9184,
        704
      ],
      "size": [
@@ -659,7 +701,7 @@
        286
      ],
      "flags": {},
-      "order": 2,
+      "order": 3,
      "mode": 0,
      "inputs": [
        {
@@ -738,47 +780,6 @@
          }
        }
      }
    },
    {
      "id": 16,
      "type": "PrimitiveInt",
      "pos": [
        -9152,
        576
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "INT",
          "type": "INT",
          "links": [
            31,
            35
          ]
        }
      ],
      "title": "Frames number each loops",
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.13.0",
        "Node name for S&R": "PrimitiveInt",
        "ue_properties": {
          "widget_ue_connectable": {},
          "input_ue_unconnectable": {},
          "version": "7.6.2"
        }
      },
      "widgets_values": [
        100,
        "fixed"
      ]
    }
  ],
  "links": [
@@ -933,10 +934,10 @@
    "workflowRendererVersion": "LG",
    "ue_links": [],
    "ds": {
-      "scale": 1.0834705943388552,
+      "scale": 0.8954302432552531,
      "offset": [
-        10009.878269742538,
+        10389.297857289295,
-        -100.68482917709798
+        79.21414284327875
      ]
    },
    "links_added_by_ue": [],
@@ -9,7 +9,13 @@
 # --------------------------------------------------------
 import collections
-import cupy
+try:
    import cupy
 except Exception:
    # Broad catch: an installed-but-broken cupy (e.g. incompatible NumPy)
    # raises non-ImportError exceptions at import time. Treat any failure as
    # "cupy unavailable" and fall back to the pure-PyTorch implementation.
    cupy = None
 import os
 import re
 import torch
@@ -260,31 +266,94 @@ def cuda_kernel(strFunction: str, strKernel: str, objVariables: typing.Dict):
 # end
-@cupy.memoize(for_each_device=True)
+_cuda_launch_cache = {}
@torch.compiler.disable()
 def cuda_launch(strKey: str):
-    try:
+    if strKey not in _cuda_launch_cache:
        os.environ.setdefault("CUDA_HOME", cupy.cuda.get_cuda_path())
    except Exception:
        if "CUDA_HOME" not in os.environ:
-            raise RuntimeError("'CUDA_HOME' not set, unable to find cuda-toolkit installation.")
+            try:
-    
+                cuda_path = cupy.cuda.get_cuda_path()
-    strKernel = objCudacache[strKey]["strKernel"]
+            except Exception:
-    strFunction = objCudacache[strKey]["strFunction"]
+                cuda_path = None
-    
+            if cuda_path is None:
-    return cupy.RawModule(
+                cuda_path = "/usr/local/cuda"
-        code=strKernel,
+            os.environ["CUDA_HOME"] = cuda_path
-        options=(
+        strKernel = objCudacache[strKey]["strKernel"]
-            "-I " + os.environ["CUDA_HOME"],
+        strFunction = objCudacache[strKey]["strFunction"]
-            "-I " + os.environ["CUDA_HOME"] + "/include",
+        _cuda_launch_cache[strKey] = cupy.RawModule(
-        ),
+            code=strKernel,
-    ).get_function(strFunction)
+            options=(
                "-I " + os.environ["CUDA_HOME"],
                "-I " + os.environ["CUDA_HOME"] + "/include",
            ),
        ).get_function(strFunction)
    return _cuda_launch_cache[strKey]
 ##########################################################
 def _pytorch_softsplat_impl(tenIn, tenFlow):
    """Pure-PyTorch forward warp via bilinear splatting (scatter_add)."""
    B, C, H, W = tenIn.shape
    tenOut = tenIn.new_zeros(B, C, H, W)
    grid_y, grid_x = torch.meshgrid(
        torch.arange(H, device=tenIn.device, dtype=tenIn.dtype),
        torch.arange(W, device=tenIn.device, dtype=tenIn.dtype),
        indexing='ij',
    )
    flt_x = grid_x.unsqueeze(0) + tenFlow[:, 0, :, :]
    flt_y = grid_y.unsqueeze(0) + tenFlow[:, 1, :, :]
    valid = torch.isfinite(flt_x) & torch.isfinite(flt_y)
    flt_x = torch.where(valid, flt_x, torch.zeros_like(flt_x))
    flt_y = torch.where(valid, flt_y, torch.zeros_like(flt_y))
    nw_x = flt_x.floor().long()
    nw_y = flt_y.floor().long()
    frac_x = flt_x - nw_x.to(flt_x.dtype)
    frac_y = flt_y - nw_y.to(flt_y.dtype)
    w_nw = (1.0 - frac_x) * (1.0 - frac_y) * valid
    w_ne = frac_x * (1.0 - frac_y) * valid
    w_sw = (1.0 - frac_x) * frac_y * valid
    w_se = frac_x * frac_y * valid
    out_flat = tenOut.view(B, C, -1)
    for dx, dy, w in [(0, 0, w_nw), (1, 0, w_ne), (0, 1, w_sw), (1, 1, w_se)]:
        tx = nw_x + dx
        ty = nw_y + dy
        in_bounds = (tx >= 0) & (tx < W) & (ty >= 0) & (ty < H)
        w_masked = w * in_bounds
        idx = (ty.clamp(0, H - 1) * W + tx.clamp(0, W - 1))
        idx = idx.unsqueeze(1).expand_as(tenIn)
        weighted = tenIn * w_masked.unsqueeze(1)
        out_flat.scatter_add_(2, idx.reshape(B, C, -1), weighted.reshape(B, C, -1))
    return tenOut
 _softsplat_fn = None
 def _pytorch_softsplat(tenIn, tenFlow):
    global _softsplat_fn
    if _softsplat_fn is None:
        try:
            _softsplat_fn = torch.compile(_pytorch_softsplat_impl)
        except Exception:
            _softsplat_fn = _pytorch_softsplat_impl
    try:
        return _softsplat_fn(tenIn, tenFlow)
    except Exception:
        _softsplat_fn = _pytorch_softsplat_impl
        return _softsplat_fn(tenIn, tenFlow)
@torch.compiler.disable()
 def softsplat(tenIn, tenFlow, tenMetric, strMode, return_norm=False):
    assert strMode.split("-")[0] in ["sum", "avg", "linear", "softmax"]
@@ -366,7 +435,7 @@ class softsplat_func(torch.autograd.Function):
            [tenIn.shape[0], tenIn.shape[1], tenIn.shape[2], tenIn.shape[3]]
        )
-        if tenIn.is_cuda == True:
+        if tenIn.is_cuda and cupy is not None:
            cuda_launch(
                cuda_kernel(
                    "softsplat_out",
@@ -439,8 +508,8 @@ class softsplat_func(torch.autograd.Function):
                ),
            )
-        elif tenIn.is_cuda != True:
+        else:
-            assert False
+            tenOut = _pytorch_softsplat(tenIn, tenFlow)
        # end
@@ -8,44 +8,39 @@ def get_cupy_package():
    try:
        import torch
        if not torch.cuda.is_available():
            print("[Tween] WARNING: CUDA not available. cupy requires CUDA.")
            return None
        cuda_version = torch.version.cuda
        if cuda_version is None:
            print("[Tween] WARNING: PyTorch has no CUDA version info.")
            return None
        major = int(cuda_version.split(".")[0])
        cupy_pkg = f"cupy-cuda{major}x"
        print(f"[Tween] Detected CUDA {cuda_version}, will use {cupy_pkg}")
        return cupy_pkg
-    except Exception as e:
+    except Exception:
        print(f"[Tween] WARNING: Could not detect CUDA version: {e}")
        return None
 def update_requirements(cupy_pkg):
    """Write the correct cupy package into requirements.txt."""
    requirements_path = os.path.join(os.path.dirname(__file__), "requirements.txt")
    lines = []
    if os.path.exists(requirements_path):
        with open(requirements_path, "r") as f:
            lines = [l.rstrip() for l in f if not l.strip().startswith("cupy")]
    if cupy_pkg and cupy_pkg not in lines:
        lines.append(cupy_pkg)
    with open(requirements_path, "w") as f:
        f.write("\n".join(lines) + "\n")
 def install():
-    cupy_pkg = get_cupy_package()
+    # Install core requirements first
    if cupy_pkg:
        update_requirements(cupy_pkg)
    requirements_path = os.path.join(os.path.dirname(__file__), "requirements.txt")
    subprocess.check_call([
        sys.executable, "-m", "pip", "install", "-r", requirements_path
    ])
    # Try to install cupy for NVIDIA users (optional, improves performance)
    cupy_pkg = get_cupy_package()
    if cupy_pkg:
        try:
            subprocess.check_call([
                sys.executable, "-m", "pip", "install", cupy_pkg
            ])
            print(f"[Tween] cupy installed ({cupy_pkg}) — fast CUDA kernels enabled")
        except subprocess.CalledProcessError:
            print(f"[Tween] WARNING: Could not install {cupy_pkg}. "
                  f"BIM-VFI, SGM-VFI, and GIMM-VFI will use slower PyTorch fallback.")
    else:
        print("[Tween] cupy skipped (no NVIDIA CUDA). "
              "BIM-VFI, SGM-VFI, and GIMM-VFI will use PyTorch fallback.")
 if __name__ == "__main__":
    install()
@@ -0,0 +1,22 @@
 [project]
 name = "comfyui-tween"
 description = "Video frame interpolation nodes for ComfyUI using BIM-VFI, EMA-VFI, SGM-VFI, and GIMM-VFI. Designed for long videos with thousands of frames."
 version = "1.1.0"
 license = "Apache-2.0"
 requires-python = ">=3.10"
 dependencies = [
    "gdown",
    "timm",
    "omegaconf",
    "yacs",
    "easydict",
    "einops",
    "huggingface_hub",
 ]
 [project.urls]
 Repository = "https://github.com/Ethanfel/ComfyUI-Tween"
 [tool.comfy]
 PublisherId = "ethanfel"
 DisplayName = "Tween - Video Frame Interpolation"
@@ -1,4 +1,5 @@
 gdown
 timm
 omegaconf
 yacs
 easydict
@@ -1,7 +1,13 @@
 #!/usr/bin/env python
 import collections
-import cupy
+try:
    import cupy
 except Exception:
    # Broad catch: an installed-but-broken cupy (e.g. incompatible NumPy)
    # raises non-ImportError exceptions at import time. Treat any failure as
    # "cupy unavailable" and fall back to the pure-PyTorch implementation.
    cupy = None
 import os
 import re
 import torch
@@ -216,20 +222,91 @@ def cuda_kernel(strFunction:str, strKernel:str, objVariables:typing.Dict):
 # end
-@cupy.memoize(for_each_device=True)
+_cuda_launch_cache = {}
 def cuda_launch(strKey:str):
    if 'CUDA_HOME' not in os.environ:
        os.environ['CUDA_HOME'] = cupy.cuda.get_cuda_path()
    # end
-    return cupy.RawKernel(objCudacache[strKey]['strKernel'], objCudacache[strKey]['strFunction'],
+def cuda_launch(strKey:str):
-                          options=tuple(['-I ' + os.environ['CUDA_HOME'], '-I ' + os.environ['CUDA_HOME'] + '/include']))
+    if strKey not in _cuda_launch_cache:
        if 'CUDA_HOME' not in os.environ:
            try:
                cuda_path = cupy.cuda.get_cuda_path()
            except Exception:
                cuda_path = None
            if cuda_path is None:
                cuda_path = '/usr/local/cuda'
            os.environ['CUDA_HOME'] = cuda_path
        _cuda_launch_cache[strKey] = cupy.RawKernel(
            objCudacache[strKey]['strKernel'],
            objCudacache[strKey]['strFunction'],
            options=tuple(['-I ' + os.environ['CUDA_HOME'],
                           '-I ' + os.environ['CUDA_HOME'] + '/include'])
        )
    return _cuda_launch_cache[strKey]
 # end
 ##########################################################
 def _pytorch_softsplat_impl(tenIn, tenFlow):
    """Pure-PyTorch forward warp via bilinear splatting (scatter_add)."""
    B, C, H, W = tenIn.shape
    tenOut = tenIn.new_zeros(B, C, H, W)
    grid_y, grid_x = torch.meshgrid(
        torch.arange(H, device=tenIn.device, dtype=tenIn.dtype),
        torch.arange(W, device=tenIn.device, dtype=tenIn.dtype),
        indexing='ij',
    )
    flt_x = grid_x.unsqueeze(0) + tenFlow[:, 0, :, :]
    flt_y = grid_y.unsqueeze(0) + tenFlow[:, 1, :, :]
    valid = torch.isfinite(flt_x) & torch.isfinite(flt_y)
    flt_x = torch.where(valid, flt_x, torch.zeros_like(flt_x))
    flt_y = torch.where(valid, flt_y, torch.zeros_like(flt_y))
    nw_x = flt_x.floor().long()
    nw_y = flt_y.floor().long()
    frac_x = flt_x - nw_x.to(flt_x.dtype)
    frac_y = flt_y - nw_y.to(flt_y.dtype)
    w_nw = (1.0 - frac_x) * (1.0 - frac_y) * valid
    w_ne = frac_x * (1.0 - frac_y) * valid
    w_sw = (1.0 - frac_x) * frac_y * valid
    w_se = frac_x * frac_y * valid
    out_flat = tenOut.view(B, C, -1)
    for dx, dy, w in [(0, 0, w_nw), (1, 0, w_ne), (0, 1, w_sw), (1, 1, w_se)]:
        tx = nw_x + dx
        ty = nw_y + dy
        in_bounds = (tx >= 0) & (tx < W) & (ty >= 0) & (ty < H)
        w_masked = w * in_bounds
        idx = (ty.clamp(0, H - 1) * W + tx.clamp(0, W - 1))
        idx = idx.unsqueeze(1).expand_as(tenIn)
        weighted = tenIn * w_masked.unsqueeze(1)
        out_flat.scatter_add_(2, idx.reshape(B, C, -1), weighted.reshape(B, C, -1))
    return tenOut
 _softsplat_fn = None
 def _pytorch_softsplat(tenIn, tenFlow):
    global _softsplat_fn
    if _softsplat_fn is None:
        try:
            _softsplat_fn = torch.compile(_pytorch_softsplat_impl)
        except Exception:
            _softsplat_fn = _pytorch_softsplat_impl
    try:
        return _softsplat_fn(tenIn, tenFlow)
    except Exception:
        _softsplat_fn = _pytorch_softsplat_impl
        return _softsplat_fn(tenIn, tenFlow)
 # end
 def softsplat(tenIn:torch.Tensor, tenFlow:torch.Tensor, tenMetric:torch.Tensor, strMode:str):
    assert(strMode.split('-')[0] in ['sum', 'avg', 'linear', 'soft'])
@@ -281,7 +358,7 @@ class softsplat_func(torch.autograd.Function):
    def forward(self, tenIn, tenFlow):
        tenOut = tenIn.new_zeros([tenIn.shape[0], tenIn.shape[1], tenIn.shape[2], tenIn.shape[3]])
-        if tenIn.is_cuda == True:
+        if tenIn.is_cuda and cupy is not None:
            cuda_launch(cuda_kernel('softsplat_out', '''
                extern "C" __global__ void __launch_bounds__(512) softsplat_out(
                    const int n,
@@ -345,8 +422,8 @@ class softsplat_func(torch.autograd.Function):
                stream=collections.namedtuple('Stream', 'ptr')(torch.cuda.current_stream().cuda_stream)
            )
-        elif tenIn.is_cuda != True:
+        else:
-            assert(False)
+            tenOut = _pytorch_softsplat(tenIn, tenFlow)
        # end
@@ -1,72 +0,0 @@
 import { app } from "../../scripts/app.js";
 import { api } from "../../scripts/api.js";
 function fitHeight(node) {
    node.setSize([node.size[0], node.computeSize([node.size[0], node.size[1]])[1]]);
    node?.graph?.setDirtyCanvas(true);
 }
 app.registerExtension({
    name: "Tween.VideoPreview",
    async beforeRegisterNodeDef(nodeType, nodeData) {
        if (nodeData?.name !== "TweenConcatVideos") return;
        const onNodeCreated = nodeType.prototype.onNodeCreated;
        nodeType.prototype.onNodeCreated = function () {
            onNodeCreated?.apply(this, arguments);
            const container = document.createElement("div");
            const previewWidget = this.addDOMWidget("videopreview", "preview", container, {
                serialize: false,
                hideOnZoom: false,
                getValue() { return container.value; },
                setValue(v) { container.value = v; },
            });
            previewWidget.computeSize = function (width) {
                if (this.aspectRatio && !this.videoEl.hidden) {
                    const height = (previewNode.size[0] - 20) / this.aspectRatio + 10;
                    return [width, height > 0 ? height : -4];
                }
                return [width, -4];
            };
            const previewNode = this;
            previewWidget.videoEl = document.createElement("video");
            previewWidget.videoEl.controls = true;
            previewWidget.videoEl.loop = true;
            previewWidget.videoEl.muted = true;
            previewWidget.videoEl.style.width = "100%";
            previewWidget.videoEl.hidden = true;
            previewWidget.videoEl.addEventListener("loadedmetadata", () => {
                previewWidget.aspectRatio = previewWidget.videoEl.videoWidth / previewWidget.videoEl.videoHeight;
                fitHeight(previewNode);
            });
            previewWidget.videoEl.addEventListener("error", () => {
                previewWidget.videoEl.hidden = true;
                fitHeight(previewNode);
            });
            container.appendChild(previewWidget.videoEl);
        };
        const onExecuted = nodeType.prototype.onExecuted;
        nodeType.prototype.onExecuted = function (message) {
            onExecuted?.apply(this, arguments);
            if (!message?.gifs?.length) return;
            const params = message.gifs[0];
            const previewWidget = this.widgets?.find((w) => w.name === "videopreview");
            if (!previewWidget) return;
            const query = new URLSearchParams(params);
            query.set("timestamp", Date.now());
            previewWidget.videoEl.src = api.apiURL("/view?" + query);
            previewWidget.videoEl.hidden = false;
            previewWidget.videoEl.autoplay = true;
        };
    },
 });
Author	SHA1	Message	Date
Ethanfel	2d96d5aa5d	fix: catch all exceptions when importing cupy, not just ImportError An installed-but-broken cupy (e.g. incompatible with NumPy 2.5, which removed the 'bool8' alias) raises a TypeError during its own import, not an ImportError. The narrow `except ImportError` guard let that propagate and crashed the entire node import chain. Broaden the guard to `except Exception` in all three CUDA-kernel modules so any import-time failure disables cupy and falls back to the pure-PyTorch implementations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-27 19:51:28 +02:00
Ethanfel	0c62c6eef4	docs: add cupy-fallback implementation plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 10:27:52 +02:00
Ethanfel	83e4b5dd98	perf: add torch.compile to PyTorch fallback kernels Wraps _pytorch_softsplat and _pytorch_costvol with torch.compile for ~6x speedup on ROCm/non-cupy setups. Falls back to eager execution gracefully if compilation fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 10:13:30 +02:00
Ethanfel	2e75e2d076	fix: handle None from cupy.cuda.get_cuda_path() in cuda_launch cupy.cuda.get_cuda_path() can return None when CUDA_HOME is not set and cupy can't auto-detect it. Fall back to /usr/local/cuda. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 02:20:20 +02:00
Ethanfel	c08fe58fe7	feat: make cupy optional in install.py cupy is now a best-effort install for NVIDIA users. Non-CUDA setups (ROCm, CPU) skip cupy and use PyTorch fallback kernels instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 02:12:03 +02:00
Ethanfel	9e84890877	feat: remove cupy requirement gate from model loading Models now fall back to pure-PyTorch implementations when cupy is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 02:11:30 +02:00
Ethanfel	2e98e453a4	Add pure-PyTorch fallback for BIM-VFI cost volume kernel When cupy is unavailable, the costvol_func.forward() now falls back to a pure-PyTorch implementation using unfold + dot product instead of raising a RuntimeError. The CUDA/cupy kernel path is preserved unchanged for when cupy is available. This allows BIM-VFI to run on systems without cupy (including CPU-only setups), matching the pattern used for the softsplat fallbacks in SGM-VFI and GIMM-VFI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 02:09:56 +02:00
Ethanfel	daf0304243	Add pure-PyTorch fallback for GIMM-VFI softsplat forward warp Make cupy import optional (try/except), replace @cupy.memoize with a dict cache, add _pytorch_softsplat() using scatter_add for bilinear splatting, and update forward() dispatch to fall back to PyTorch when cupy is unavailable or tensor is on CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 02:07:08 +02:00
Ethanfel	5ce7b0edcb	fix: use dtype-preserving cast in SGM-VFI softsplat fallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 02:05:24 +02:00
Ethanfel	8d8407ec9d	Add pure-PyTorch fallback for SGM-VFI softsplat forward warp Make cupy import optional so the module loads without cupy installed. Replace @cupy.memoize decorator with a simple dict cache to avoid crash at import time. Add _pytorch_softsplat() using scatter_add_ as a fallback when cupy is unavailable or tensors are on CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 01:59:23 +02:00
Ethanfel	91947c0b8c	Use actual input frame count for all_on_gpu and chunk_size estimates Replace hardcoded 199-frame assumption with 2*N-1 from the actual images input, giving accurate VRAM/RAM estimates for any batch size. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:56:57 +01:00
Ethanfel	c4b69321bb	Reorder VFI Optimizer outputs: images first, settings second Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:54:00 +01:00
Ethanfel	f1da0f7876	Add images passthrough output to VFI Optimizer Avoids needing a dual link from the image source — the optimizer passes images through so they can be connected directly to the Interpolate node. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:53:28 +01:00
Ethanfel	27c5bcf362	Fix total_mem → total_memory attribute on CudaDeviceProperties Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:33:45 +01:00
Ethanfel	d2e7db49c7	Bump version to 1.1.0 Publish to Comfy registry / Publish Custom Node to registry (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:16:11 +01:00
Ethanfel	9f66233b53	Add VFI Optimizer node for auto-tuning hardware settings Benchmarks the user's GPU with the actual model and resolution via a single calibration frame pair, then outputs optimal batch_size, chunk_size, keep_device, all_on_gpu, and clear_cache_after_n_frames as a connectable VFI_SETTINGS type. All 8 Interpolate/SegmentInterpolate nodes accept the new optional settings input — existing workflows without the optimizer work unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:15:48 +01:00
Ethanfel	7257c1aa4d	Fix SVG license labels not rendering in GitHub README Remove text-anchor="end" (likely stripped by GitHub's SVG sanitizer, pushing text off-screen). Use left-aligned positioning instead, increase font size and color brightness for visibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:44:47 +01:00
Ethanfel	ebece55ed7	Add license labels to model comparison SVG BIM-VFI shows "Research only" in amber, others show "Apache 2.0". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:39:42 +01:00
Ethanfel	a60fb2a25e	Redesign README: add SVG model comparison, reorganize by priority Move installation to top, add shields.io badges, create visual model comparison chart, use collapsible sections for node reference, condense acknowledgments into a table with citations in a collapsible block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:37:50 +01:00
Ethanfel	c178f756da	Expand cupy install guide in README Add step-by-step instructions, CUDA version table, troubleshooting section, and note that EMA-VFI works without cupy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:25:36 +01:00
Ethanfel	fb921ae620	Remove cupy auto-install from __init__.py No more pip calls at import time. Users get a clear error with install instructions from the Load node if cupy is missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:24:06 +01:00
Ethanfel	4723dc329d	Add cupy check to Load nodes with install instructions BIM-VFI, SGM-VFI, and GIMM-VFI Load nodes now check for cupy at load time and raise a clear error with the user's CUDA version and the exact pip install command. Updated README with step-by-step cupy install instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:15:44 +01:00
Ethanfel	8fe382e5ec	Remove auto-install of all deps except cupy Dependencies are now handled by pyproject.toml / requirements.txt via ComfyUI Manager or pip. Only cupy is auto-installed at load time since it requires matching the PyTorch CUDA version; failures produce a warning instead of crashing. Also added timm to requirements.txt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 13:13:01 +01:00
Ethanfel	8311fd0261	Add ComfyUI registry publishing workflow and pyproject.toml Publish to Comfy registry / Publish Custom Node to registry (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 23:28:21 +01:00
Ethanfel	396dafeefc	Fix warp cache buildup when all_on_gpu is enabled The all_on_gpu guard was preventing warp cache clearing and torch.cuda.empty_cache() from ever running, causing unbounded VRAM growth during long interpolation runs. Cache clearing now runs on the clear_cache_after_n_frames interval regardless of the all_on_gpu setting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 21:27:47 +01:00
Ethanfel	13a89c5831	Add console logging to all VFI interpolation nodes Log mode/params, pass progress, chunk count, and output frame count for BIM-VFI, EMA-VFI, SGM-VFI, and GIMM-VFI interpolation nodes. Segment nodes also log their input frame range and target fps output range. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 21:14:11 +01:00
Ethanfel	2f1cc17f5c	Add oversampled image output to all VFI Interpolate nodes Second IMAGE output exposes the full power-of-2 oversampled frames before target FPS selection. Identical to the first output when target_fps=0. Document the new output in README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 17:39:58 +01:00
Ethanfel	b2d7d3b634	Update README: document target FPS mode, fix repo URL, update concat description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 17:06:55 +01:00
Ethanfel	adc4451716	wf update	2026-02-13 22:55:30 +01:00
Ethanfel	6dd579dcc7	Add target FPS mode to all VFI models and remove concat preview Add source_fps/target_fps inputs to all 4 Interpolate and Segment nodes (BIM, EMA, SGM, GIMM). When target_fps > 0, auto-computes optimal power-of-2 oversample, runs existing recursive t=0.5 interpolation, then selects frames at target timestamps. Handles downsampling (no model calls), same-fps passthrough, and high ratios (e.g. 3→30fps). Segment boundary logic uses global index computation for gap-free stitching. When target_fps=0, existing multiplier behavior is preserved. Remove video preview from TweenConcatVideos: drop preview input, delete web/js/tween_preview.js, remove WEB_DIRECTORY from __init__.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 22:51:04 +01:00