Files

T

Ethanfel 14542a6b00 docs: update README with all nodes (presets, mix voices, EPUB loader)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-05 19:07:15 +02:00

6.2 KiB

Raw Permalink Blame History

ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

Voice Cloning — clone any voice from a short reference audio clip
Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
Auto Voice — let the model pick a voice automatically
Voice Presets — built-in curated reference voices, ready to use without any audio file
Voice Mixing — blend two or three reference voices for a hybrid speaker
EPUB Loader — load chapters from an ebook directly into the pipeline
Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
Multilingual — 600+ languages

Installation

Clone into your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git

Install via ComfyUI Manager (recommended) — it runs install.py and requirements.txt automatically.

Or manually:
```
pip install omnivoice --no-deps
pip install -r requirements.txt
```
Why --no-deps for omnivoice? It pins torch==2.8.* from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. install.py handles this automatically; requirements.txt covers the remaining deps safely.
Restart ComfyUI. The nodes will appear under the OmniVoice category.

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches to ComfyUI/models/omnivoice/.

Input	Type	Description
`device`	dropdown	`cuda:0`, `cuda:1`, or `cpu`
`dtype`	dropdown	`float16`, `bfloat16`, or `float32`

Output: OMNIVOICE_MODEL

OmniVoice Generate

Generates speech from text using a loaded model.

Input	Type	Description
`model`	OMNIVOICE_MODEL	From OmniVoice Model Loader
`text`	string	Text to synthesize (full pages supported)
`mode`	dropdown	`voice_cloning`, `voice_design`, or `auto_voice`
`ref_audio`	AUDIO	Reference audio for voice cloning (optional)
`ref_text`	string	Transcription of ref audio — connect a Whisper node for best results (optional)
`instruct`	string	Voice description for voice design mode (optional)
`speed`	float	Speed multiplier — default 1.0
`num_step`	int	Diffusion steps — default 32 (use 16 for faster generation)
`seed`	int	Diffusion seed — set the same value across all Generate nodes in an audiobook pipeline to keep the voice consistent. 0 = random

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

OmniVoice Voice Preset

Pre-fetched reference voices. Audio is downloaded once and cached to ComfyUI/models/omnivoice/presets/.

Input	Type	Description
`preset`	dropdown	Choose from built-in voices

Outputs: ref_audio (AUDIO), ref_text (STRING) — wire directly into OmniVoice Generate.

Available presets:

Name	Gender	Style
Shadowheart	Female	Expressive
American actress	Female	Theatrical
Podcast host	Female	Casual
Nature	Male	Warm
Old Hollywood	Male	Classic
Rick Sanchez	Male	Casual
Stewie Griffin	Male	Precise
Harvey Keitel	Male	Intense
Conan O'Brien	Male	Comedy

OmniVoice Mix Voices

Concatenates two or three reference audio clips to create a blended speaker. The model extracts a speaker embedding from the combined clip, producing a hybrid voice.

Input	Type	Description
`audio_1`	AUDIO	First reference voice (required)
`audio_2`	AUDIO	Second reference voice (required)
`weight_1`	float	Duration weight for audio_1 (0.0–1.0)
`weight_2`	float	Duration weight for audio_2 (0.0–1.0)
`audio_3`	AUDIO	Third reference voice (optional)
`weight_3`	float	Duration weight for audio_3 (optional)
`text_1/2/3`	string	Transcripts for each clip — merged into ref_text output

Outputs: ref_audio (AUDIO), ref_text (STRING) — wire directly into OmniVoice Generate.

Weight controls how much of each clip's duration ends up in the mix. Equal weights (1.0 / 1.0) is a good starting point.

OmniVoice EPUB Loader

Loads an EPUB file and outputs selected chapters as plain text, ready to pipe into OmniVoice Generate.

Input	Type	Description
`epub_path`	string	Absolute path to the `.epub` file
`chapter_start`	int	First chapter to include (1-indexed)
`chapter_end`	int	Last chapter to include (inclusive, auto-clamped)

Outputs: text (STRING) — selected chapters joined by ---, chapter_list (STRING) — numbered list of all chapter titles for reference.

Default Workflow

A ready-to-use workflow is included at workflows/voice_cloning.json:

[OmniVoice Model Loader] ──────────────────────────────────┐
                                                            ▼
[OmniVoice Voice Preset] ──► ref_audio ──► [OmniVoice Generate] ──► [Save Audio]
                         └──► ref_text ──►

Load it via ComfyUI → Load Workflow.

Audiobook Pipeline

For multi-chapter audiobooks, use the same seed across all Generate nodes to keep the voice consistent between paragraphs:

[OmniVoice Model Loader] ──────────────────────────────────────────┐
                                                                    ▼
[OmniVoice EPUB Loader] ──► chapter text ──► [OmniVoice Generate] ──► [Save Audio]
                                                      ▲
[OmniVoice Voice Preset] ──► ref_audio / ref_text ──►
                              seed = 42 (fixed)

Credits

OmniVoice by k2-fsa
OmniVoice paper

6.2 KiB Raw Permalink Blame History Unescape Escape