SDPA with 3D xformers-BMK tensors cannot use Flash Attention and falls
back to efficient_attention/math kernels that miscompute on Ada Lovelace
GPUs (e.g. RTX 6000 Pro), producing brownish line artifacts. Unsqueeze
to 4D (1, B*H, N, D) so Flash Attention is eligible. Also add a naive
"math" backend (chunked bmm) as a guaranteed-correct diagnostic baseline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Download OpenCLIP ViT-H-14 to models/text_encoders/ and SVD temporal
VAE to models/vae/svd-temporal-vae/ instead of hidden library caches,
so they're visible, reusable, and shared with other nodes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Newer open_clip creates nn.MultiheadAttention with batch_first=True,
but STAR's embedder unconditionally permutes to [seq, batch, embed].
This causes a RuntimeError in the text encoder (attn_mask shape
mismatch). The patch detects batch_first at runtime and only permutes
when needed.
Patches in patches/ are auto-applied to the STAR submodule on startup
and skip gracefully if already applied.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the auto-detect xformers shim with a runtime dispatcher that
always intercepts xformers.ops.memory_efficient_attention. A new
dropdown on STARModelLoader (and --attention CLI arg) lets users
explicitly select: sdpa (default), xformers, sageattn, or specific
SageAttention kernels (fp16 triton/cuda, fp8 cuda). Only backends
that successfully import appear as options.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids requiring xformers installation by shimming
xformers.ops.memory_efficient_attention with
torch.nn.functional.scaled_dot_product_attention when
xformers is not available.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detects when the STAR submodule directory is empty (cloned without
--recursive) and runs git submodule update --init automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Process videos in overlapping segments (25% overlap with linear crossfade
blending) so peak memory is bounded by one segment rather than the full
video. New segment_size parameter on the Super-Resolution node (default 0
= all at once, recommended 16-32 for long videos). Also update README
clone URL to GitHub mirror.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-node package wrapping the STAR (ICCV 2025) diffusion-based video
upscaling pipeline:
- STAR Model Loader: loads UNet+ControlNet, OpenCLIP text encoder, and
temporal VAE with auto-download from HuggingFace
- STAR Video Super-Resolution: runs the full diffusion pipeline with
configurable upscale factor, guidance, solver mode, chunking, and
color correction
Includes three VRAM offload modes (disabled/model/aggressive) to
support GPUs from 12GB to 40GB+.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>