Commit Graph

5 Commits

Author SHA1 Message Date
fa250897a2 Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection
Root cause: three critical differences from naxci1 reference implementation:

1. Batch decode after loop → streaming per-chunk TCDecoder decode with LQ
   conditioning inside the loop. The TCDecoder uses causal convolutions with
   temporal memory that must be built incrementally per-chunk. Batch decode
   breaks this design and loses LQ frame conditioning, causing ghosting.

2. Buffer_LQ4x_Proj → Causal_LQ4x_Proj for FlashVSR v1.1. The causal
   variant reads the OLD cache before writing the new one (truly causal),
   while Buffer writes cache BEFORE the conv call. Using the wrong variant
   misaligns temporal LQ conditioning features.

3. Temporal padding formula: changed from round-up to largest_8n1_leq(N+4)
   matching the naxci1 reference approach.

Changes:
- flashvsr_full.py: streaming TCDecoder decode per-chunk with LQ conditioning
  and per-chunk color correction (was: batch VAE decode after loop)
- flashvsr_tiny.py: streaming TCDecoder decode per-chunk (was: batch decode)
- inference.py: use Causal_LQ4x_Proj, build TCDecoder for ALL modes (including
  full), fix temporal padding to largest_8n1_leq(N+4), clear TCDecoder in
  clear_caches()
- utils.py: add Causal_LQ4x_Proj class
- nodes.py: update progress bar estimation for new padding formula

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:42:20 +01:00
5071c4de4f Fix sageattn fallback: tensors already rearranged when exception fires
When sageattn fails, q/k/v are already in [b,n,s,d] format from the
rearrange before the call. Use SDPA directly on them instead of calling
_sdpa_fallback which expects [b,s,(n*d)] and crashes with a shape error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:08:01 +01:00
dd69a2fd2b Fix sageattn crash on Blackwell GPUs (sm_120)
SageAttention CUDA kernels don't support Blackwell yet. Catch runtime
failures from sageattn/sparse_sageattn, disable them, and fall back to
PyTorch SDPA. Only pays the try/except cost once per session.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:03:15 +01:00
f40504cbcf Fix crash when flash_attn is installed but broken
Verify attention backend functions are actually callable before marking
them available. Falls back to PyTorch SDPA instead of calling None.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:51:30 +01:00
0fecfcee37 Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B)
Vendor minimal diffsynth subset for FlashVSR inference (full/tiny pipelines,
v1 and v1.1 checkpoints auto-downloaded from HuggingFace). Includes segment-based
processing with temporal overlap and crossfade blending for bounded RAM on long videos.

Nodes: Load FlashVSR Model, FlashVSR Upscale, FlashVSR Segment Upscale.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:12:33 +01:00