Newer open_clip creates nn.MultiheadAttention with batch_first=True, but STAR's embedder unconditionally permutes to [seq, batch, embed]. This causes a RuntimeError in the text encoder (attn_mask shape mismatch). The patch detects batch_first at runtime and only permutes when needed. Patches in patches/ are auto-applied to the STAR submodule on startup and skip gracefully if already applied. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 KiB
14 KiB