Files

T

Ethanfel dbb3207df1 Replace install.py with standard requirements.txt

install.py was running arbitrary pip installs as part of node loading,
which is dangerous in a shared venv. Standard approach: requirements.txt
lists the safe deps (transformers, accelerate, soundfile, etc.);
omnivoice itself must be installed once manually with --no-deps to avoid
overwriting ComfyUI's torch. README documents this clearly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 17:44:52 +02:00

3.1 KiB

Raw Blame History

ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

Voice Cloning — clone any voice from a short reference audio clip
Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
Auto Voice — let the model pick a voice automatically
Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
Multilingual — 600+ languages

Installation

Clone into your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git

Install omnivoice without its pinned torch (one-time manual step):
```
pip install omnivoice --no-deps
```
Why --no-deps? omnivoice pins torch==2.8.* from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. The --no-deps flag skips that pin; ComfyUI's existing torch works fine at runtime.
Restart ComfyUI. ComfyUI Manager will install the remaining dependencies from requirements.txt automatically. The nodes will appear under the OmniVoice category.

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.

Input	Type	Description
`model_source`	dropdown	`Auto-download (HuggingFace)` or `Local path`
`local_path`	string	Path to local checkpoint (optional)
`device`	dropdown	`cuda:0`, `cuda:1`, or `cpu`
`dtype`	dropdown	`float16`, `bfloat16`, or `float32`

Output: OMNIVOICE_MODEL

OmniVoice Generate

Generates speech from text using a loaded model.

Input	Type	Description
`model`	OMNIVOICE_MODEL	From OmniVoice Model Loader
`text`	string	Text to synthesize (full pages supported)
`mode`	dropdown	`voice_cloning`, `voice_design`, or `auto_voice`
`ref_audio`	AUDIO	Reference audio for voice cloning (optional)
`ref_text`	string	Transcription of ref audio — auto-detected if blank (optional)
`instruct`	string	Voice description for voice design mode (optional)
`speed`	float	Speed multiplier — default 1.0
`num_step`	int	Diffusion steps — default 32 (use 16 for faster generation)

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

Example Workflow (Audiobook)

[OmniVoice Model Loader] ─────────────────────────┐
                                                    ▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
                                        ▲
                              text = "Page 1 content..."
                              mode = voice_cloning

Repeat the Generate + Save Audio nodes for each page, reusing the same loader.

Credits

OmniVoice by k2-fsa
OmniVoice paper

3.1 KiB Raw Blame History