Files
ComfyUI-Prompt-Calibrator/nodes
Ethanfel f7ea559690 Speed: auto flash-attention/SDPA + document perf levers
transformers .generate() is the slow path; reasoning token volume and swap_eval
(2 passes) are the multipliers. Now requests attn_implementation flash_attention_2
-> sdpa -> default automatically (free speedup, flash-attn optional). README gains
a Performance section: swap_eval off (biggest free win), flash-attn, smaller model/
fewer axes, avoid nf4 for speed, and vLLM/SGLang as the real production-speed path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 11:18:11 +02:00
..