c7c7123068
Concatenates 2-3 reference audio clips (with per-voice duration weights) to create a blended speaker embedding. Merges transcripts for ref_text. Handles mismatched sample rates and mono conversion automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>