Files
ComfyUI-Omnivoice/README.md
T
Ethanfel dbb3207df1 Replace install.py with standard requirements.txt
install.py was running arbitrary pip installs as part of node loading,
which is dangerous in a shared venv. Standard approach: requirements.txt
lists the safe deps (transformers, accelerate, soundfile, etc.);
omnivoice itself must be installed once manually with --no-deps to avoid
overwriting ComfyUI's torch. README documents this clearly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:44:52 +02:00

3.1 KiB

ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

  • Voice Cloning — clone any voice from a short reference audio clip
  • Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
  • Auto Voice — let the model pick a voice automatically
  • Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
  • Multilingual — 600+ languages

Installation

  1. Clone into your ComfyUI custom nodes directory:

    cd ComfyUI/custom_nodes
    git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git
    
  2. Install omnivoice without its pinned torch (one-time manual step):

    pip install omnivoice --no-deps
    

    Why --no-deps? omnivoice pins torch==2.8.* from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. The --no-deps flag skips that pin; ComfyUI's existing torch works fine at runtime.

  3. Restart ComfyUI. ComfyUI Manager will install the remaining dependencies from requirements.txt automatically. The nodes will appear under the OmniVoice category.

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.

Input Type Description
model_source dropdown Auto-download (HuggingFace) or Local path
local_path string Path to local checkpoint (optional)
device dropdown cuda:0, cuda:1, or cpu
dtype dropdown float16, bfloat16, or float32

Output: OMNIVOICE_MODEL


OmniVoice Generate

Generates speech from text using a loaded model.

Input Type Description
model OMNIVOICE_MODEL From OmniVoice Model Loader
text string Text to synthesize (full pages supported)
mode dropdown voice_cloning, voice_design, or auto_voice
ref_audio AUDIO Reference audio for voice cloning (optional)
ref_text string Transcription of ref audio — auto-detected if blank (optional)
instruct string Voice description for voice design mode (optional)
speed float Speed multiplier — default 1.0
num_step int Diffusion steps — default 32 (use 16 for faster generation)

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

Example Workflow (Audiobook)

[OmniVoice Model Loader] ─────────────────────────┐
                                                    ▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
                                        ▲
                              text = "Page 1 content..."
                              mode = voice_cloning

Repeat the Generate + Save Audio nodes for each page, reusing the same loader.

Credits