Commit Graph

5 Commits

Author SHA1 Message Date
Ethanfel 83e4b5dd98 perf: add torch.compile to PyTorch fallback kernels
Wraps _pytorch_softsplat and _pytorch_costvol with torch.compile
for ~6x speedup on ROCm/non-cupy setups. Falls back to eager
execution gracefully if compilation fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 10:13:30 +02:00
Ethanfel 2e75e2d076 fix: handle None from cupy.cuda.get_cuda_path() in cuda_launch
cupy.cuda.get_cuda_path() can return None when CUDA_HOME is not set
and cupy can't auto-detect it. Fall back to /usr/local/cuda.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 02:20:20 +02:00
Ethanfel 5ce7b0edcb fix: use dtype-preserving cast in SGM-VFI softsplat fallback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 02:05:24 +02:00
Ethanfel 8d8407ec9d Add pure-PyTorch fallback for SGM-VFI softsplat forward warp
Make cupy import optional so the module loads without cupy installed.
Replace @cupy.memoize decorator with a simple dict cache to avoid
crash at import time. Add _pytorch_softsplat() using scatter_add_
as a fallback when cupy is unavailable or tensors are on CPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 01:59:23 +02:00
Ethanfel 42ebdd8b96 Add SGM-VFI (CVPR 2024) frame interpolation support
SGM-VFI combines local flow estimation with sparse global matching
(GMFlow) to handle large motion and occlusion-heavy scenes. Adds 3 new
nodes: Load SGM-VFI Model, SGM-VFI Interpolate, SGM-VFI Segment
Interpolate. Architecture files vendored from MCG-NJU/SGM-VFI with
device-awareness fixes (no hardcoded .cuda()), relative imports, and
debug code removed. README updated with model comparison table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 23:02:48 +01:00