Root cause: three critical differences from naxci1 reference implementation:
1. Batch decode after loop → streaming per-chunk TCDecoder decode with LQ
conditioning inside the loop. The TCDecoder uses causal convolutions with
temporal memory that must be built incrementally per-chunk. Batch decode
breaks this design and loses LQ frame conditioning, causing ghosting.
2. Buffer_LQ4x_Proj → Causal_LQ4x_Proj for FlashVSR v1.1. The causal
variant reads the OLD cache before writing the new one (truly causal),
while Buffer writes cache BEFORE the conv call. Using the wrong variant
misaligns temporal LQ conditioning features.
3. Temporal padding formula: changed from round-up to largest_8n1_leq(N+4)
matching the naxci1 reference approach.
Changes:
- flashvsr_full.py: streaming TCDecoder decode per-chunk with LQ conditioning
and per-chunk color correction (was: batch VAE decode after loop)
- flashvsr_tiny.py: streaming TCDecoder decode per-chunk (was: batch decode)
- inference.py: use Causal_LQ4x_Proj, build TCDecoder for ALL modes (including
full), fix temporal padding to largest_8n1_leq(N+4), clear TCDecoder in
clear_caches()
- utils.py: add Causal_LQ4x_Proj class
- nodes.py: update progress bar estimation for new padding formula
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>