ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

Voice Cloning — clone any voice from a short reference audio clip
Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
Auto Voice — let the model pick a voice automatically
Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
Multilingual — 600+ languages

Installation

Clone into your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git

Install dependencies:
```
pip install omnivoice
```
Restart ComfyUI. The nodes will appear under the OmniVoice category.

Note: OmniVoice requires PyTorch 2.8+ and a CUDA-capable GPU (or Apple Silicon).

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.

Input	Type	Description
`model_source`	dropdown	`Auto-download (HuggingFace)` or `Local path`
`local_path`	string	Path to local checkpoint (optional)
`device`	dropdown	`cuda:0`, `cuda:1`, or `cpu`
`dtype`	dropdown	`float16`, `bfloat16`, or `float32`

Output: OMNIVOICE_MODEL

OmniVoice Generate

Generates speech from text using a loaded model.

Input	Type	Description
`model`	OMNIVOICE_MODEL	From OmniVoice Model Loader
`text`	string	Text to synthesize (full pages supported)
`mode`	dropdown	`voice_cloning`, `voice_design`, or `auto_voice`
`ref_audio`	AUDIO	Reference audio for voice cloning (optional)
`ref_text`	string	Transcription of ref audio — auto-detected if blank (optional)
`instruct`	string	Voice description for voice design mode (optional)
`speed`	float	Speed multiplier — default 1.0
`num_step`	int	Diffusion steps — default 32 (use 16 for faster generation)

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

Example Workflow (Audiobook)

[OmniVoice Model Loader] ─────────────────────────┐
                                                    ▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
                                        ▲
                              text = "Page 1 content..."
                              mode = voice_cloning

Repeat the Generate + Save Audio nodes for each page, reusing the same loader.

Credits

OmniVoice by k2-fsa
OmniVoice paper

2.8 KiB Raw Blame History