Newer open_clip creates nn.MultiheadAttention with batch_first=True,
but STAR's embedder unconditionally permutes to [seq, batch, embed].
This causes a RuntimeError in the text encoder (attn_mask shape
mismatch). The patch detects batch_first at runtime and only permutes
when needed.
Patches in patches/ are auto-applied to the STAR submodule on startup
and skip gracefully if already applied.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the auto-detect xformers shim with a runtime dispatcher that
always intercepts xformers.ops.memory_efficient_attention. A new
dropdown on STARModelLoader (and --attention CLI arg) lets users
explicitly select: sdpa (default), xformers, sageattn, or specific
SageAttention kernels (fp16 triton/cuda, fp8 cuda). Only backends
that successfully import appear as options.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids requiring xformers installation by shimming
xformers.ops.memory_efficient_attention with
torch.nn.functional.scaled_dot_product_attention when
xformers is not available.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detects when the STAR submodule directory is empty (cloned without
--recursive) and runs git submodule update --init automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone inference script that works outside ComfyUI — just activate
the same Python venv. Streams output frames to ffmpeg so peak RAM stays
bounded regardless of video length. Supports video files, image
sequences, and single images. Audio is automatically preserved from
input videos.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Process videos in overlapping segments (25% overlap with linear crossfade
blending) so peak memory is bounded by one segment rather than the full
video. New segment_size parameter on the Super-Resolution node (default 0
= all at once, recommended 16-32 for long videos). Also update README
clone URL to GitHub mirror.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-node package wrapping the STAR (ICCV 2025) diffusion-based video
upscaling pipeline:
- STAR Model Loader: loads UNet+ControlNet, OpenCLIP text encoder, and
temporal VAE with auto-download from HuggingFace
- STAR Video Super-Resolution: runs the full diffusion pipeline with
configurable upscale factor, guidance, solver mode, chunking, and
color correction
Includes three VRAM offload modes (disabled/model/aggressive) to
support GPUs from 12GB to 40GB+.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>