T

Ethanfel fa250897a2 Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection

Root cause: three critical differences from naxci1 reference implementation:

1. Batch decode after loop → streaming per-chunk TCDecoder decode with LQ
   conditioning inside the loop. The TCDecoder uses causal convolutions with
   temporal memory that must be built incrementally per-chunk. Batch decode
   breaks this design and loses LQ frame conditioning, causing ghosting.

2. Buffer_LQ4x_Proj → Causal_LQ4x_Proj for FlashVSR v1.1. The causal
   variant reads the OLD cache before writing the new one (truly causal),
   while Buffer writes cache BEFORE the conv call. Using the wrong variant
   misaligns temporal LQ conditioning features.

3. Temporal padding formula: changed from round-up to largest_8n1_leq(N+4)
   matching the naxci1 reference approach.

Changes:
- flashvsr_full.py: streaming TCDecoder decode per-chunk with LQ conditioning
  and per-chunk color correction (was: batch VAE decode after loop)
- flashvsr_tiny.py: streaming TCDecoder decode per-chunk (was: batch decode)
- inference.py: use Causal_LQ4x_Proj, build TCDecoder for ALL modes (including
  full), fix temporal padding to largest_8n1_leq(N+4), clear TCDecoder in
  clear_caches()
- utils.py: add Causal_LQ4x_Proj class
- nodes.py: update progress bar estimation for new padding formula

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 17:42:20 +01:00

bim_vfi_arch

Rename project to ComfyUI-Tween

2026-02-12 23:08:54 +01:00

ema_vfi_arch

Add EMA-VFI (CVPR 2023) frame interpolation support

2026-02-12 22:30:06 +01:00

example_workflows

Fix typo in example workflow filename (lopp → loop)

2026-02-13 00:03:12 +01:00

flashvsr_arch

Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection

2026-02-13 17:42:20 +01:00

gimm_vfi_arch

Add GIMM-VFI support (NeurIPS 2024) with single-pass arbitrary-timestep interpolation

2026-02-13 13:11:45 +01:00

sgm_vfi_arch

Add SGM-VFI (CVPR 2024) frame interpolation support

2026-02-12 23:02:48 +01:00

utils

Add EMA-VFI (CVPR 2023) frame interpolation support

2026-02-12 22:30:06 +01:00

web/js

Add frontend JS for TweenConcatVideos video preview

2026-02-12 23:54:29 +01:00

__init__.py

Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B)

2026-02-13 15:12:33 +01:00

.gitignore

Add EMA-VFI (CVPR 2023) frame interpolation support

2026-02-12 22:30:06 +01:00

inference.py

Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection

2026-02-13 17:42:20 +01:00

install.py

Rename project to ComfyUI-Tween

2026-02-12 23:08:54 +01:00

nodes.py

Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection

2026-02-13 17:42:20 +01:00

README.md

Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B)

2026-02-13 15:12:33 +01:00

requirements.txt

Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B)

2026-02-13 15:12:33 +01:00

README.md

ComfyUI BIM-VFI + EMA-VFI + SGM-VFI + GIMM-VFI + FlashVSR

ComfyUI custom nodes for video frame interpolation using BiM-VFI (CVPR 2025), EMA-VFI (CVPR 2023), SGM-VFI (CVPR 2024), and GIMM-VFI (NeurIPS 2024), plus video super-resolution using FlashVSR (arXiv 2025). Designed for long videos with thousands of frames — processes them without running out of VRAM.

Which model should I use?

	BIM-VFI	EMA-VFI	SGM-VFI	GIMM-VFI
Best for	General-purpose, non-uniform motion	Fast inference, light VRAM	Large motion, occlusion-heavy scenes	High multipliers (4x/8x) in a single pass
Quality	Highest overall	Good	Best on large motion	Good
Speed	Moderate	Fastest	Slowest	Fast for 4x/8x (single pass)
VRAM	~2 GB/pair	~1.5 GB/pair	~3 GB/pair	~2.5 GB/pair
Params	~17M	~14–65M	~15M + GMFlow	~80M (RAFT) / ~123M (FlowFormer)
Arbitrary timestep	Yes	Yes (with `_t` checkpoint)	No (fixed 0.5)	Yes (native single-pass)
4x/8x mode	Recursive 2x passes	Recursive 2x passes	Recursive 2x passes	Single forward pass (or recursive)
Paper	CVPR 2025	CVPR 2023	CVPR 2024	NeurIPS 2024
License	Research only	Apache 2.0	Apache 2.0	Apache 2.0

TL;DR: Start with BIM-VFI for best quality. Use EMA-VFI if you need speed or lower VRAM. Use SGM-VFI if your video has large camera motion or fast-moving objects that the others struggle with. Use GIMM-VFI when you want 4x or 8x interpolation without recursive passes — it generates all intermediate frames in a single forward pass per pair.

Video Super-Resolution

FlashVSR is a different category — spatial upscaling rather than temporal interpolation. It can be combined with any of the VFI models above.

	FlashVSR
Task	4x video super-resolution
Architecture	Wan 2.1-1.3B DiT + VAE (diffusion-based)
Modes	Full (best quality), Tiny (fast), Tiny-Long (streaming, lowest VRAM)
VRAM	~8–12 GB (tiled, tiny mode) / ~16–24 GB (full mode)
Params	~1.3B (DiT) + ~200M (VAE)
Min input	21 frames
Paper	arXiv 2510.12747
License	Apache 2.0

Nodes

BIM-VFI

Load BIM-VFI Model

Loads the BiM-VFI checkpoint. Auto-downloads from Google Drive on first use to ComfyUI/models/bim-vfi/.

Input	Description
model_path	Checkpoint file from `models/bim-vfi/`
auto_pyr_level	Auto-select pyramid level by resolution (<540p=3, 540p=5, 1080p=6, 4K=7)
pyr_level	Manual pyramid level (3-7), only used when auto is off

BIM-VFI Interpolate

Interpolates frames from an image batch.

Input	Description
images	Input image batch
model	Model from the loader node
multiplier	2x, 4x, or 8x frame rate (recursive 2x passes)
batch_size	Frame pairs processed simultaneously (higher = faster, more VRAM)
chunk_size	Process in segments of N input frames (0 = disabled). Bounds VRAM for very long videos. Result is identical to processing all at once
keep_device	Keep model on GPU between pairs (faster, ~200MB constant VRAM)
all_on_gpu	Keep all intermediate frames on GPU (fast, needs large VRAM)
clear_cache_after_n_frames	Clear CUDA cache every N pairs to prevent VRAM buildup

BIM-VFI Segment Interpolate

Same as Interpolate but processes a single segment of the input. Chain multiple instances with Save nodes between them to bound peak RAM. The model pass-through output forces sequential execution.

Tween Concat Videos

Concatenates segment video files into a single video using ffmpeg. Connect from any Segment Interpolate's model output to ensure it runs after all segments are saved. Works with all three models.

EMA-VFI

Load EMA-VFI Model

Loads an EMA-VFI checkpoint. Auto-downloads from Google Drive on first use to ComfyUI/models/ema-vfi/. Variant (large/small) and timestep support are auto-detected from the filename.

Input	Description
model_path	Checkpoint file from `models/ema-vfi/`
tta	Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality)

Available checkpoints:

Checkpoint	Variant	Params	Arbitrary timestep
`ours_t.pkl`	Large	~65M	Yes
`ours.pkl`	Large	~65M	No (fixed 0.5)
`ours_small_t.pkl`	Small	~14M	Yes
`ours_small.pkl`	Small	~14M	No (fixed 0.5)

EMA-VFI Interpolate

Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.

EMA-VFI Segment Interpolate

Same as EMA-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.

SGM-VFI

Load SGM-VFI Model

Loads an SGM-VFI checkpoint. Auto-downloads from Google Drive on first use to ComfyUI/models/sgm-vfi/. Variant (base/small) is auto-detected from the filename (default is small).

Input	Description
model_path	Checkpoint file from `models/sgm-vfi/`
tta	Test-time augmentation: flip input and average with unflipped result (~2x slower, slightly better quality)
num_key_points	Sparsity of global matching (0.0 = global everywhere, 0.5 = default balance, higher = faster)

Available checkpoints:

Checkpoint	Variant	Params
`ours-1-2-points.pkl`	Small	~15M + GMFlow

SGM-VFI Interpolate

Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate.

SGM-VFI Segment Interpolate

Same as SGM-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.

GIMM-VFI

Load GIMM-VFI Model

Loads a GIMM-VFI checkpoint. Auto-downloads from HuggingFace on first use to ComfyUI/models/gimm-vfi/. The matching flow estimator (RAFT or FlowFormer) is auto-detected and downloaded alongside the main model.

Input	Description
model_path	Checkpoint file from `models/gimm-vfi/`
ds_factor	Downscale factor for internal processing (1.0 = full res, 0.5 = half). Lower = less VRAM, faster, less quality. Try 0.5 for 4K inputs

Available checkpoints:

Checkpoint	Variant	Params	Flow estimator (auto-downloaded)
`gimmvfi_r_arb_lpips_fp32.safetensors`	RAFT	~80M	`raft-things_fp32.safetensors`
`gimmvfi_f_arb_lpips_fp32.safetensors`	FlowFormer	~123M	`flowformer_sintel_fp32.safetensors`

GIMM-VFI Interpolate

Interpolates frames from an image batch. Same controls as BIM-VFI Interpolate, plus:

Input	Description
single_pass	When enabled (default), generates all intermediate frames per pair in one forward pass using GIMM-VFI's arbitrary-timestep capability. No recursive 2x passes needed for 4x or 8x. Disable to use the standard recursive approach (same as BIM/EMA/SGM)

GIMM-VFI Segment Interpolate

Same as GIMM-VFI Interpolate but processes a single segment. Same pattern as BIM-VFI Segment Interpolate.

Output frame count (VFI models): 2x = 2N-1, 4x = 4N-3, 8x = 8N-7

FlashVSR

FlashVSR does 4x video super-resolution (spatial upscaling), not frame interpolation. It uses a diffusion-based approach built on Wan 2.1-1.3B for temporally coherent upscaling.

Load FlashVSR Model

Downloads checkpoints from HuggingFace (~7.5 GB) on first use to ComfyUI/models/flashvsr/.

Input	Description
mode	Pipeline mode: `tiny` (fast TCDecoder decode), `tiny-long` (streaming TCDecoder, lowest VRAM for long videos), `full` (standard VAE decode, best quality)
precision	`bf16` (faster on modern GPUs) or `fp16` (for older GPUs)

Checkpoints (auto-downloaded from 1038lab/FlashVSR):

Checkpoint	Size	Description
`FlashVSR1_1.safetensors`	~5 GB	Main DiT model (v1.1)
`Wan2.1_VAE.safetensors`	~2 GB	Video VAE
`LQ_proj_in.safetensors`	~50 MB	Low-quality frame projection
`TCDecoder.safetensors`	~200 MB	Tiny conditional decoder (for tiny/tiny-long modes)
`Prompt.safetensors`	~1 MB	Precomputed text embeddings

FlashVSR Upscale

Upscales an image batch with 4x spatial super-resolution.

Input	Description
images	Input video frames (minimum 21 frames)
model	Model from the loader node
scale	Upscaling factor: 2x or 4x (4x is native resolution)
frame_chunk_size	Process in chunks of N frames to bound VRAM (0 = all at once). Recommended: 33 or 65. Each chunk must be >= 21 frames
tiled	Enable tiled VAE decode (reduces VRAM significantly)
tile_size_h / tile_size_w	VAE tile dimensions in latent space (default 60/104)
topk_ratio	Sparse attention ratio. Higher = faster, may lose fine detail (default 2.0)
kv_ratio	KV cache ratio. Higher = better quality, more VRAM (default 2.0)
local_range	Local attention window: 9 = sharper details, 11 = more temporal stability
color_fix	Apply wavelet color correction to prevent color shifts
unload_dit	Offload DiT to CPU before VAE decode (saves VRAM, slower)
seed	Random seed for the diffusion process

FlashVSR Segment Upscale

Same as FlashVSR Upscale but processes a single segment of the input. Chain multiple instances with Save nodes between them to bound peak RAM. The model pass-through output forces sequential execution.

Input	Description
segment_index	Which segment to process (0-based)
segment_size	Number of input frames per segment (minimum 21)
overlap_frames	Overlapping frames between adjacent segments for temporal context and crossfade blending
blend_frames	Number of frames within the overlap to crossfade (must be <= overlap_frames)

Plus all the same upscale parameters as FlashVSR Upscale.

Installation

Clone into your ComfyUI custom_nodes/ directory:

cd ComfyUI/custom_nodes
git clone https://github.com/your-user/ComfyUI-Tween.git

Dependencies (gdown, cupy, timm, omegaconf, easydict, yacs, einops, huggingface_hub, safetensors) are auto-installed on first load. The correct cupy variant is detected from your PyTorch CUDA version.

Warning: cupy is a large package (~800MB) and compilation/installation can take several minutes. The first ComfyUI startup after installing this node may appear to hang while cupy installs in the background. Check the console log for progress. If auto-install fails (e.g. missing build tools in Docker), install manually with:
pip install cupy-cuda12x  # replace 12 with your CUDA major version

To install manually:

cd ComfyUI-Tween
python install.py

Requirements

PyTorch with CUDA
cupy (matching your CUDA version, for BIM-VFI, SGM-VFI, and GIMM-VFI)
timm (for EMA-VFI and SGM-VFI)
gdown (for BIM-VFI/EMA-VFI/SGM-VFI model auto-download)
omegaconf, easydict, yacs, einops (for GIMM-VFI)
huggingface_hub (for GIMM-VFI and FlashVSR model auto-download)
safetensors (for FlashVSR checkpoint loading)

VRAM Guide

VRAM	Recommended settings
8 GB	batch_size=1, chunk_size=500
24 GB	batch_size=2-4, chunk_size=1000
48 GB+	batch_size=4-16, all_on_gpu=true
96 GB+	batch_size=8-16, all_on_gpu=true, chunk_size=0

Acknowledgments

This project wraps the official BiM-VFI implementation by the KAIST VIC Lab, the official EMA-VFI implementation by MCG-NJU, the official SGM-VFI implementation by MCG-NJU, the GIMM-VFI implementation by S-Lab (NTU), and FlashVSR by OpenImagingLab. GIMM-VFI architecture files in gimm_vfi_arch/ are adapted from kijai/ComfyUI-GIMM-VFI with safetensors checkpoints from Kijai/GIMM-VFI_safetensors. FlashVSR architecture files in flashvsr_arch/ are adapted from 1038lab/ComfyUI-FlashVSR (a diffsynth subset) with safetensors checkpoints from 1038lab/FlashVSR. Architecture files in bim_vfi_arch/, ema_vfi_arch/, sgm_vfi_arch/, gimm_vfi_arch/, and flashvsr_arch/ are vendored from their respective repositories with minimal modifications (relative imports, device-awareness fixes, dtype safety patches, inference-only paths).

BiM-VFI:

Wonyong Seo, Jihyong Oh, and Munchurl Kim. "BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [arXiv] [Project Page] [GitHub]

@inproceedings{seo2025bimvfi,
  title={BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions},
  author={Seo, Wonyong and Oh, Jihyong and Kim, Munchurl},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

EMA-VFI:

Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, and Limin Wang. "Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. [arXiv] [GitHub]

@inproceedings{zhang2023emavfi,
  title={Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation},
  author={Zhang, Guozhen and Zhu, Yuhan and Wang, Haonan and Chen, Youxin and Wu, Gangshan and Wang, Limin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

SGM-VFI:

Guozhen Zhang, Yuhan Zhu, Evan Zheran Liu, Haonan Wang, Mingzhen Sun, Gangshan Wu, and Limin Wang. "Sparse Global Matching for Video Frame Interpolation with Large Motion." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [arXiv] [GitHub]

@inproceedings{zhang2024sgmvfi,
  title={Sparse Global Matching for Video Frame Interpolation with Large Motion},
  author={Zhang, Guozhen and Zhu, Yuhan and Liu, Evan Zheran and Wang, Haonan and Sun, Mingzhen and Wu, Gangshan and Wang, Limin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

GIMM-VFI:

Zujin Guo, Wei Li, and Chen Change Loy. "Generalizable Implicit Motion Modeling for Video Frame Interpolation." Advances in Neural Information Processing Systems (NeurIPS), 2024. [arXiv] [GitHub]

@inproceedings{guo2024gimmvfi,
  title={Generalizable Implicit Motion Modeling for Video Frame Interpolation},
  author={Guo, Zujin and Li, Wei and Loy, Chen Change},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

FlashVSR:

Junhao Zhuang, Ting-Che Lin, Xin Zhong, Zhihong Pan, Chun Yuan, and Ailing Zeng. "FlashVSR: Efficient Real-World Video Super-Resolution via Distilled Diffusion Transformer." arXiv preprint arXiv:2510.12747, 2025. [arXiv] [GitHub]

@article{zhuang2025flashvsr,
  title={FlashVSR: Efficient Real-World Video Super-Resolution via Distilled Diffusion Transformer},
  author={Zhuang, Junhao and Lin, Ting-Che and Zhong, Xin and Pan, Zhihong and Yuan, Chun and Zeng, Ailing},
  journal={arXiv preprint arXiv:2510.12747},
  year={2025}
}

License

The BiM-VFI model weights and architecture code are provided by KAIST VIC Lab for research and education purposes only. Commercial use requires permission from the principal investigator (Prof. Munchurl Kim, mkimee@kaist.ac.kr). See the original repository for details.

The EMA-VFI model weights and architecture code are released under the Apache 2.0 License. See the original repository for details.

The SGM-VFI model weights and architecture code are released under the Apache 2.0 License. See the original repository for details.

The GIMM-VFI model weights and architecture code are released under the Apache 2.0 License. See the original repository for details. ComfyUI adaptation based on kijai/ComfyUI-GIMM-VFI.

The FlashVSR model weights and architecture code are released under the Apache 2.0 License. See the original repository for details. Architecture files adapted from 1038lab/ComfyUI-FlashVSR.

README.md Unescape Escape

ComfyUI BIM-VFI + EMA-VFI + SGM-VFI + GIMM-VFI + FlashVSR

Which model should I use?

Video Super-Resolution

Nodes

BIM-VFI

Load BIM-VFI Model

BIM-VFI Interpolate

BIM-VFI Segment Interpolate

Tween Concat Videos

EMA-VFI

Load EMA-VFI Model

EMA-VFI Interpolate

EMA-VFI Segment Interpolate

SGM-VFI

Load SGM-VFI Model

SGM-VFI Interpolate

SGM-VFI Segment Interpolate

GIMM-VFI

Load GIMM-VFI Model

GIMM-VFI Interpolate

GIMM-VFI Segment Interpolate

FlashVSR

Load FlashVSR Model

FlashVSR Upscale

FlashVSR Segment Upscale

Installation

Requirements

VRAM Guide

Acknowledgments

License

README.md