fix: wrap CLIP encoding in inference_mode during pre-generation
CLIP weights are inference tensors from ComfyUI loading. The worker thread runs without inference_mode, so PyTorch rejects inference tensors in multi_head_attention_forward (version counter tracking). Wrap the encode_text_clip call in torch.inference_mode() since text encoding doesn't need gradients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -529,6 +529,7 @@ def _pregenerate_lora_mels(model, data_dir, lora_adapter_path, device, dtype,
|
|||||||
prompt = prompt_map.get(npz_path.name, data.get("prompt", default_prompt))
|
prompt = prompt_map.get(npz_path.name, data.get("prompt", default_prompt))
|
||||||
if isinstance(prompt, np.ndarray):
|
if isinstance(prompt, np.ndarray):
|
||||||
prompt = str(prompt)
|
prompt = str(prompt)
|
||||||
|
with torch.inference_mode():
|
||||||
text_clip = feature_utils.encode_text_clip([prompt]).to(device, dtype)
|
text_clip = feature_utils.encode_text_clip([prompt]).to(device, dtype)
|
||||||
|
|
||||||
# Load clean audio
|
# Load clean audio
|
||||||
|
|||||||
Reference in New Issue
Block a user