T

Ethanfel f8a3bebe9c feat: add seed=42 to default workflow for voice consistency

Sets a default seed so the voice stays consistent across all generated
chunks when using the workflow as a starting point for audiobook pipelines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 18:58:22 +02:00

nodes

Add seed parameter to OmniVoice Generate for consistent voice across chunks

2026-04-05 18:53:58 +02:00

tests

Remove local path option from model loader

2026-04-05 18:02:55 +02:00

workflows

feat: add seed=42 to default workflow for voice consistency

2026-04-05 18:58:22 +02:00

__init__.py

Add OmniVoice Voice Preset node with two female voice samples

2026-04-05 18:19:29 +02:00

conftest.py

docs: document private pytest API usage in conftest

2026-04-05 09:05:53 +02:00

install.py

Restore install.py for omnivoice --no-deps only

2026-04-05 17:45:24 +02:00

pytest.ini

feat: add OmniVoiceModelLoader node

2026-04-05 08:52:26 +02:00

README.md

Remove local path option from model loader

2026-04-05 18:02:55 +02:00

requirements.txt

Replace install.py with standard requirements.txt

2026-04-05 17:44:52 +02:00

README.md

ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

Voice Cloning — clone any voice from a short reference audio clip
Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
Auto Voice — let the model pick a voice automatically
Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
Multilingual — 600+ languages

Installation

Clone into your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git

Install via ComfyUI Manager (recommended) — it runs install.py and requirements.txt automatically.

Or manually:
```
pip install omnivoice --no-deps
pip install -r requirements.txt
```
Why --no-deps for omnivoice? It pins torch==2.8.* from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. install.py handles this automatically; requirements.txt covers the remaining deps safely.
Restart ComfyUI. The nodes will appear under the OmniVoice category.

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.

Input	Type	Description
`device`	dropdown	`cuda:0`, `cuda:1`, or `cpu`
`dtype`	dropdown	`float16`, `bfloat16`, or `float32`

Downloads automatically from HuggingFace on first run and caches to ComfyUI/models/omnivoice/.

Output: OMNIVOICE_MODEL

OmniVoice Generate

Generates speech from text using a loaded model.

Input	Type	Description
`model`	OMNIVOICE_MODEL	From OmniVoice Model Loader
`text`	string	Text to synthesize (full pages supported)
`mode`	dropdown	`voice_cloning`, `voice_design`, or `auto_voice`
`ref_audio`	AUDIO	Reference audio for voice cloning (optional)
`ref_text`	string	Transcription of ref audio — auto-detected if blank (optional)
`instruct`	string	Voice description for voice design mode (optional)
`speed`	float	Speed multiplier — default 1.0
`num_step`	int	Diffusion steps — default 32 (use 16 for faster generation)

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

Example Workflow (Audiobook)

[OmniVoice Model Loader] ─────────────────────────┐
                                                    ▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
                                        ▲
                              text = "Page 1 content..."
                              mode = voice_cloning

Repeat the Generate + Save Audio nodes for each page, reusing the same loader.

Credits

OmniVoice by k2-fsa
OmniVoice paper