requirements.txt cannot install omnivoice (it would pull in torch==2.8.* and break ComfyUI). install.py now does exactly one thing: install omnivoice --no-deps, skipped if already present. All other deps remain in requirements.txt for ComfyUI Manager to handle normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.1 KiB
ComfyUI-Omnivoice
A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.
Features
- Voice Cloning — clone any voice from a short reference audio clip
- Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
- Auto Voice — let the model pick a voice automatically
- Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
- Multilingual — 600+ languages
Installation
-
Clone into your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git -
Install via ComfyUI Manager (recommended) — it runs
install.pyandrequirements.txtautomatically.Or manually:
pip install omnivoice --no-deps pip install -r requirements.txtWhy
--no-depsfor omnivoice? It pinstorch==2.8.*from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build.install.pyhandles this automatically;requirements.txtcovers the remaining deps safely. -
Restart ComfyUI. The nodes will appear under the OmniVoice category.
Nodes
OmniVoice Model Loader
Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.
| Input | Type | Description |
|---|---|---|
model_source |
dropdown | Auto-download (HuggingFace) or Local path |
local_path |
string | Path to local checkpoint (optional) |
device |
dropdown | cuda:0, cuda:1, or cpu |
dtype |
dropdown | float16, bfloat16, or float32 |
Output: OMNIVOICE_MODEL
OmniVoice Generate
Generates speech from text using a loaded model.
| Input | Type | Description |
|---|---|---|
model |
OMNIVOICE_MODEL | From OmniVoice Model Loader |
text |
string | Text to synthesize (full pages supported) |
mode |
dropdown | voice_cloning, voice_design, or auto_voice |
ref_audio |
AUDIO | Reference audio for voice cloning (optional) |
ref_text |
string | Transcription of ref audio — auto-detected if blank (optional) |
instruct |
string | Voice description for voice design mode (optional) |
speed |
float | Speed multiplier — default 1.0 |
num_step |
int | Diffusion steps — default 32 (use 16 for faster generation) |
Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.
Example Workflow (Audiobook)
[OmniVoice Model Loader] ─────────────────────────┐
▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
▲
text = "Page 1 content..."
mode = voice_cloning
Repeat the Generate + Save Audio nodes for each page, reusing the same loader.
Credits
- OmniVoice by k2-fsa
- OmniVoice paper