install.py was running arbitrary pip installs as part of node loading, which is dangerous in a shared venv. Standard approach: requirements.txt lists the safe deps (transformers, accelerate, soundfile, etc.); omnivoice itself must be installed once manually with --no-deps to avoid overwriting ComfyUI's torch. README documents this clearly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.1 KiB
ComfyUI-Omnivoice
A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.
Features
- Voice Cloning — clone any voice from a short reference audio clip
- Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
- Auto Voice — let the model pick a voice automatically
- Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
- Multilingual — 600+ languages
Installation
-
Clone into your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git -
Install
omnivoicewithout its pinned torch (one-time manual step):pip install omnivoice --no-depsWhy
--no-deps? omnivoice pinstorch==2.8.*from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. The--no-depsflag skips that pin; ComfyUI's existing torch works fine at runtime. -
Restart ComfyUI. ComfyUI Manager will install the remaining dependencies from
requirements.txtautomatically. The nodes will appear under the OmniVoice category.
Nodes
OmniVoice Model Loader
Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.
| Input | Type | Description |
|---|---|---|
model_source |
dropdown | Auto-download (HuggingFace) or Local path |
local_path |
string | Path to local checkpoint (optional) |
device |
dropdown | cuda:0, cuda:1, or cpu |
dtype |
dropdown | float16, bfloat16, or float32 |
Output: OMNIVOICE_MODEL
OmniVoice Generate
Generates speech from text using a loaded model.
| Input | Type | Description |
|---|---|---|
model |
OMNIVOICE_MODEL | From OmniVoice Model Loader |
text |
string | Text to synthesize (full pages supported) |
mode |
dropdown | voice_cloning, voice_design, or auto_voice |
ref_audio |
AUDIO | Reference audio for voice cloning (optional) |
ref_text |
string | Transcription of ref audio — auto-detected if blank (optional) |
instruct |
string | Voice description for voice design mode (optional) |
speed |
float | Speed multiplier — default 1.0 |
num_step |
int | Diffusion steps — default 32 (use 16 for faster generation) |
Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.
Example Workflow (Audiobook)
[OmniVoice Model Loader] ─────────────────────────┐
▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
▲
text = "Page 1 content..."
mode = voice_cloning
Repeat the Generate + Save Audio nodes for each page, reusing the same loader.
Credits
- OmniVoice by k2-fsa
- OmniVoice paper