Bundle sparse_sage Triton kernel for block-sparse attention

Without sparse attention, the model uses full (dense) attention which attends to distant irrelevant information, causing ghosting artifacts. The FlashVSR paper explicitly requires block-sparse attention. Vendored from SageAttention team (Apache 2.0), pure Triton (no CUDA C++). Import chain: local sparse_sage → external sageattn.core → SDPA fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:22:40 +01:00
parent e7e7c1cb5a
commit dd61ae8d1f
5 changed files with 361 additions and 3 deletions
--- a/flashvsr_arch/models/sparse_sage/init.py
+++ b/flashvsr_arch/models/sparse_sage/init.py
@@ -0,0 +1,3 @@
+from .core import sparse_sageattn
+
+__all__ = ["sparse_sageattn"]