SDPA with 3D xformers-BMK tensors cannot use Flash Attention and falls back to efficient_attention/math kernels that miscompute on Ada Lovelace GPUs (e.g. RTX 6000 Pro), producing brownish line artifacts. Unsqueeze to 4D (1, B*H, N, D) so Flash Attention is eligible. Also add a naive "math" backend (chunked bmm) as a guaranteed-correct diagnostic baseline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
25 KiB
Executable File
25 KiB
Executable File