Commit Graph

15 Commits

Author SHA1 Message Date
3b87652184 Fix FlashVSR attention mask and output quality
- Use generate_draft_block_mask_refined for sparse attention mask (matches
  naxci1's generate_draft_block_mask_sage with proper half-block key scoring)
- Remove spurious repeat_interleave(2, dim=-1) from generate_draft_block_mask
  that doubled the key dimension incorrectly
- Add torch.clamp(0, 1) to _to_frames output (matches naxci1's tensor2video)
- Add .to(self.device) on LQ video slices in streaming loop for all pipelines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:41:43 +01:00
76dff7e573 Fix FlashVSR quality: two-stage temporal padding, kv_ratio=3, float64 precision
Root cause of remaining ghosting: our single-stage temporal padding
(N+4 → floor to 8k+1) TRUNCATED frames when N+4 wasn't already 8k+1.
For 50 frames: 50+4=54 → floor to 49, LOSING the last input frame.
The pipeline then processed misaligned LQ→output frame mapping.

Fix matches naxci1/ComfyUI-FlashVSR_Stable two-stage approach:
1. Pad to next_8n5(N) (next integer >= N of form 8k+5, minimum 21)
2. Add 4 → result is always 8(k+1)+1, a valid 8k+1 — NEVER truncates

Also:
- kv_ratio default 2.0→3.0 (matches naxci1, max quality KV cache)
- local_range default 9→11 (more stable temporal consistency)
- sinusoidal_embedding_1d, precompute_freqs_cis, rope_apply: float32→float64
  (matches naxci1 reference precision for embeddings and RoPE)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:06:46 +01:00
fa250897a2 Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection
Root cause: three critical differences from naxci1 reference implementation:

1. Batch decode after loop → streaming per-chunk TCDecoder decode with LQ
   conditioning inside the loop. The TCDecoder uses causal convolutions with
   temporal memory that must be built incrementally per-chunk. Batch decode
   breaks this design and loses LQ frame conditioning, causing ghosting.

2. Buffer_LQ4x_Proj → Causal_LQ4x_Proj for FlashVSR v1.1. The causal
   variant reads the OLD cache before writing the new one (truly causal),
   while Buffer writes cache BEFORE the conv call. Using the wrong variant
   misaligns temporal LQ conditioning features.

3. Temporal padding formula: changed from round-up to largest_8n1_leq(N+4)
   matching the naxci1 reference approach.

Changes:
- flashvsr_full.py: streaming TCDecoder decode per-chunk with LQ conditioning
  and per-chunk color correction (was: batch VAE decode after loop)
- flashvsr_tiny.py: streaming TCDecoder decode per-chunk (was: batch decode)
- inference.py: use Causal_LQ4x_Proj, build TCDecoder for ALL modes (including
  full), fix temporal padding to largest_8n1_leq(N+4), clear TCDecoder in
  clear_caches()
- utils.py: add Causal_LQ4x_Proj class
- nodes.py: update progress bar estimation for new padding formula

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:42:20 +01:00
94d9818675 Fix FlashVSR quality: match naxci1 reference preprocessing
- Remove front dummy frames (not used by reference implementation)
- Use centered reflect padding instead of right/bottom replicate
- Crop output from center matching padding offsets
- Simplify temporal padding to 8k+1 alignment
- Update progress bar estimation to match new formula

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:10:12 +01:00
ea84ffef7c Fix FlashVSR ghosting: restore 2 front dummy frames matching reference
The pipeline's LQ conditioning indexing expects 2 front dummy frames
(copies of first frame) as warmup. Our previous refactoring removed
these, shifting all LQ conditioning by 2 frames and causing severe
ghosting artifacts.

Now matches the 1038lab reference preprocessing exactly:
1. _prepare_video: 2 tail copies + alignment + 2 front dummies + back padding
2. _restore_video_sequence: strip first 2 warmup frames + trim to original count
3. Crop pipeline output to padded_n before restoration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:49:46 +01:00
11e2acb9e0 Fix FlashVSR frame padding to match pipeline requirements
The pipeline requires num_frames % 4 == 1. Our old _pad_video_5d used a
wrong formula that produced non-conforming counts (e.g. 33 input → 35
padded → pipeline rounds to 37, wasting VRAM).

New padding uses num_frames % 8 == 1 (also satisfies % 4 == 1), which
ensures the streaming loop output exactly matches num_frames with zero
waste. Optimal input counts: 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105.

Also removes incorrect 2-frame warmup stripping from _restore_video_sequence
— the pipeline output doesn't have warmup artifacts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:20:02 +01:00
0fecfcee37 Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B)
Vendor minimal diffsynth subset for FlashVSR inference (full/tiny pipelines,
v1 and v1.1 checkpoints auto-downloaded from HuggingFace). Includes segment-based
processing with temporal overlap and crossfade blending for bounded RAM on long videos.

Nodes: Load FlashVSR Model, FlashVSR Upscale, FlashVSR Segment Upscale.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:12:33 +01:00
d642255e70 Add GIMM-VFI support (NeurIPS 2024) with single-pass arbitrary-timestep interpolation
Integrates GIMM-VFI alongside existing BIM/EMA/SGM models. Key feature: generates
all intermediate frames in one forward pass (no recursive 2x passes needed for 4x/8x).

- Vendor gimm_vfi_arch/ from kijai/ComfyUI-GIMM-VFI with device fixes
- Two variants: RAFT-based (~80MB) and FlowFormer-based (~123MB)
- Auto-download checkpoints from HuggingFace (Kijai/GIMM-VFI_safetensors)
- Three new nodes: Load GIMM-VFI Model, GIMM-VFI Interpolate, GIMM-VFI Segment Interpolate
- single_pass toggle: True=arbitrary timestep (default), False=recursive like other models
- ds_factor parameter for high-res input downscaling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:11:45 +01:00
e37cc3dd2e Rename project to ComfyUI-Tween
Update logger names, install prefixes, README clone instructions, and
error messages to reflect the new repo name. Model-specific node names
and categories (BIM-VFI, EMA-VFI, SGM-VFI) are unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 23:08:54 +01:00
42ebdd8b96 Add SGM-VFI (CVPR 2024) frame interpolation support
SGM-VFI combines local flow estimation with sparse global matching
(GMFlow) to handle large motion and occlusion-heavy scenes. Adds 3 new
nodes: Load SGM-VFI Model, SGM-VFI Interpolate, SGM-VFI Segment
Interpolate. Architecture files vendored from MCG-NJU/SGM-VFI with
device-awareness fixes (no hardcoded .cuda()), relative imports, and
debug code removed. README updated with model comparison table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 23:02:48 +01:00
1de086569c Add EMA-VFI (CVPR 2023) frame interpolation support
Integrate EMA-VFI alongside existing BIM-VFI with three new ComfyUI nodes:
Load EMA-VFI Model, EMA-VFI Interpolate, and EMA-VFI Segment Interpolate.

Architecture files vendored from MCG-NJU/EMA-VFI with device-awareness
fixes (removed hardcoded .cuda() calls), warp cache management, and
relative imports. InputPadder extended to support EMA-VFI's replicate
center-symmetric padding. Auto-installs timm dependency on first load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 22:30:06 +01:00
993a3a72b1 Add batch processing support for faster frame interpolation
Processes multiple frame pairs simultaneously instead of one-by-one.
New batch_size input (1-64) lets users trade VRAM for speed.
Refactored pyr_level logic into shared _get_pyr_level() helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 18:54:40 +01:00
69a4aebfe7 Add auto_pyr_level toggle to select pyramid level by resolution
When enabled (default), automatically picks the optimal pyr_level
based on input height: <540p=3, 540p=5, 1080p=6, 4K=7.
When disabled, uses the manual pyr_level value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 18:51:29 +01:00
4e6f9eb896 Respect user's pyr_level setting at all resolutions
Previously the user's pyr_level was overridden for >=540p content.
Now the setting is always used, with the tooltip recommending values
per resolution instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 18:50:29 +01:00
db64fc195a Initial commit: ComfyUI BIM-VFI node for video frame interpolation
Wraps BiM-VFI (CVPR 2025) as a ComfyUI custom node for long video
frame interpolation with memory-safe sequential processing.

- LoadBIMVFIModel: checkpoint loader with auto-download from Google Drive
- BIMVFIInterpolate: 2x/4x/8x recursive interpolation with per-pair
  GPU processing, configurable VRAM management (all_on_gpu for high-VRAM
  setups), progress bar, and backwarp cache clearing
- Vendored inference-only architecture from KAIST-VICLab/BiM-VFI
- Auto-detection of CUDA version for cupy installation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 18:26:49 +01:00