Replace the auto-detect xformers shim with a runtime dispatcher that
always intercepts xformers.ops.memory_efficient_attention. A new
dropdown on STARModelLoader (and --attention CLI arg) lets users
explicitly select: sdpa (default), xformers, sageattn, or specific
SageAttention kernels (fp16 triton/cuda, fp8 cuda). Only backends
that successfully import appear as options.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids requiring xformers installation by shimming
xformers.ops.memory_efficient_attention with
torch.nn.functional.scaled_dot_product_attention when
xformers is not available.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone inference script that works outside ComfyUI — just activate
the same Python venv. Streams output frames to ffmpeg so peak RAM stays
bounded regardless of video length. Supports video files, image
sequences, and single images. Audio is automatically preserved from
input videos.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>