ComfyUI-Tween

Ethanfel/ComfyUI-Tween

Fork 0

Commit Graph

Author	SHA1	Message	Date
Ethanfel	3b87652184	Fix FlashVSR attention mask and output quality - Use generate_draft_block_mask_refined for sparse attention mask (matches naxci1's generate_draft_block_mask_sage with proper half-block key scoring) - Remove spurious repeat_interleave(2, dim=-1) from generate_draft_block_mask that doubled the key dimension incorrectly - Add torch.clamp(0, 1) to _to_frames output (matches naxci1's tensor2video) - Add .to(self.device) on LQ video slices in streaming loop for all pipelines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 18:41:43 +01:00
Ethanfel	fa250897a2	Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection Root cause: three critical differences from naxci1 reference implementation: 1. Batch decode after loop → streaming per-chunk TCDecoder decode with LQ conditioning inside the loop. The TCDecoder uses causal convolutions with temporal memory that must be built incrementally per-chunk. Batch decode breaks this design and loses LQ frame conditioning, causing ghosting. 2. Buffer_LQ4x_Proj → Causal_LQ4x_Proj for FlashVSR v1.1. The causal variant reads the OLD cache before writing the new one (truly causal), while Buffer writes cache BEFORE the conv call. Using the wrong variant misaligns temporal LQ conditioning features. 3. Temporal padding formula: changed from round-up to largest_8n1_leq(N+4) matching the naxci1 reference approach. Changes: - flashvsr_full.py: streaming TCDecoder decode per-chunk with LQ conditioning and per-chunk color correction (was: batch VAE decode after loop) - flashvsr_tiny.py: streaming TCDecoder decode per-chunk (was: batch decode) - inference.py: use Causal_LQ4x_Proj, build TCDecoder for ALL modes (including full), fix temporal padding to largest_8n1_leq(N+4), clear TCDecoder in clear_caches() - utils.py: add Causal_LQ4x_Proj class - nodes.py: update progress bar estimation for new padding formula Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 17:42:20 +01:00
Ethanfel	0fecfcee37	Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B) Vendor minimal diffsynth subset for FlashVSR inference (full/tiny pipelines, v1 and v1.1 checkpoints auto-downloaded from HuggingFace). Includes segment-based processing with temporal overlap and crossfade blending for bounded RAM on long videos. Nodes: Load FlashVSR Model, FlashVSR Upscale, FlashVSR Segment Upscale. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 15:12:33 +01:00

Author

SHA1

Message

Date

Ethanfel

3b87652184

Fix FlashVSR attention mask and output quality

- Use generate_draft_block_mask_refined for sparse attention mask (matches
  naxci1's generate_draft_block_mask_sage with proper half-block key scoring)
- Remove spurious repeat_interleave(2, dim=-1) from generate_draft_block_mask
  that doubled the key dimension incorrectly
- Add torch.clamp(0, 1) to _to_frames output (matches naxci1's tensor2video)
- Add .to(self.device) on LQ video slices in streaming loop for all pipelines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 18:41:43 +01:00

Ethanfel

fa250897a2

Fix FlashVSR ghosting: streaming TCDecoder decode + Causal LQ projection

Root cause: three critical differences from naxci1 reference implementation:

1. Batch decode after loop → streaming per-chunk TCDecoder decode with LQ
   conditioning inside the loop. The TCDecoder uses causal convolutions with
   temporal memory that must be built incrementally per-chunk. Batch decode
   breaks this design and loses LQ frame conditioning, causing ghosting.

2. Buffer_LQ4x_Proj → Causal_LQ4x_Proj for FlashVSR v1.1. The causal
   variant reads the OLD cache before writing the new one (truly causal),
   while Buffer writes cache BEFORE the conv call. Using the wrong variant
   misaligns temporal LQ conditioning features.

3. Temporal padding formula: changed from round-up to largest_8n1_leq(N+4)
   matching the naxci1 reference approach.

Changes:
- flashvsr_full.py: streaming TCDecoder decode per-chunk with LQ conditioning
  and per-chunk color correction (was: batch VAE decode after loop)
- flashvsr_tiny.py: streaming TCDecoder decode per-chunk (was: batch decode)
- inference.py: use Causal_LQ4x_Proj, build TCDecoder for ALL modes (including
  full), fix temporal padding to largest_8n1_leq(N+4), clear TCDecoder in
  clear_caches()
- utils.py: add Causal_LQ4x_Proj class
- nodes.py: update progress bar estimation for new padding formula

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 17:42:20 +01:00

Ethanfel

0fecfcee37

Add FlashVSR support: diffusion-based 4x video super-resolution (Wan 2.1-1.3B)

Vendor minimal diffsynth subset for FlashVSR inference (full/tiny pipelines,
v1 and v1.1 checkpoints auto-downloaded from HuggingFace). Includes segment-based
processing with temporal overlap and crossfade blending for bounded RAM on long videos.

Nodes: Load FlashVSR Model, FlashVSR Upscale, FlashVSR Segment Upscale.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 15:12:33 +01:00

3 Commits