Files

T

Ethanfel a3fb88e559 Restore install.py for omnivoice --no-deps only

requirements.txt cannot install omnivoice (it would pull in torch==2.8.*
and break ComfyUI). install.py now does exactly one thing: install
omnivoice --no-deps, skipped if already present. All other deps remain
in requirements.txt for ComfyUI Manager to handle normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 17:45:24 +02:00

3.1 KiB

Raw Blame History

ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

Voice Cloning — clone any voice from a short reference audio clip
Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
Auto Voice — let the model pick a voice automatically
Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
Multilingual — 600+ languages

Installation

Clone into your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git

Install via ComfyUI Manager (recommended) — it runs install.py and requirements.txt automatically.

Or manually:
```
pip install omnivoice --no-deps
pip install -r requirements.txt
```
Why --no-deps for omnivoice? It pins torch==2.8.* from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. install.py handles this automatically; requirements.txt covers the remaining deps safely.
Restart ComfyUI. The nodes will appear under the OmniVoice category.

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.

Input	Type	Description
`model_source`	dropdown	`Auto-download (HuggingFace)` or `Local path`
`local_path`	string	Path to local checkpoint (optional)
`device`	dropdown	`cuda:0`, `cuda:1`, or `cpu`
`dtype`	dropdown	`float16`, `bfloat16`, or `float32`

Output: OMNIVOICE_MODEL

OmniVoice Generate

Generates speech from text using a loaded model.

Input	Type	Description
`model`	OMNIVOICE_MODEL	From OmniVoice Model Loader
`text`	string	Text to synthesize (full pages supported)
`mode`	dropdown	`voice_cloning`, `voice_design`, or `auto_voice`
`ref_audio`	AUDIO	Reference audio for voice cloning (optional)
`ref_text`	string	Transcription of ref audio — auto-detected if blank (optional)
`instruct`	string	Voice description for voice design mode (optional)
`speed`	float	Speed multiplier — default 1.0
`num_step`	int	Diffusion steps — default 32 (use 16 for faster generation)

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

Example Workflow (Audiobook)

[OmniVoice Model Loader] ─────────────────────────┐
                                                    ▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
                                        ▲
                              text = "Page 1 content..."
                              mode = voice_cloning

Repeat the Generate + Save Audio nodes for each page, reusing the same loader.

Credits

OmniVoice by k2-fsa
OmniVoice paper

3.1 KiB Raw Blame History