Ethanfel f8a3bebe9c feat: add seed=42 to default workflow for voice consistency
Sets a default seed so the voice stays consistent across all generated
chunks when using the workflow as a starting point for audiobook pipelines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:58:22 +02:00
2026-04-05 08:52:26 +02:00

ComfyUI-Omnivoice

A ComfyUI custom node for OmniVoice — a massive multilingual zero-shot TTS model supporting 600+ languages.

Features

  • Voice Cloning — clone any voice from a short reference audio clip
  • Voice Design — describe a voice with text (e.g. "female, low pitch, british accent")
  • Auto Voice — let the model pick a voice automatically
  • Audiobook-ready — handles arbitrarily long text with near-constant VRAM via built-in chunking
  • Multilingual — 600+ languages

Installation

  1. Clone into your ComfyUI custom nodes directory:

    cd ComfyUI/custom_nodes
    git clone https://github.com/ethanfel/ComfyUI-Omnivoice.git
    
  2. Install via ComfyUI Manager (recommended) — it runs install.py and requirements.txt automatically.

    Or manually:

    pip install omnivoice --no-deps
    pip install -r requirements.txt
    

    Why --no-deps for omnivoice? It pins torch==2.8.* from a CUDA 12.8 index. Installing it normally would overwrite ComfyUI's torch build. install.py handles this automatically; requirements.txt covers the remaining deps safely.

  3. Restart ComfyUI. The nodes will appear under the OmniVoice category.

Nodes

OmniVoice Model Loader

Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.

Input Type Description
device dropdown cuda:0, cuda:1, or cpu
dtype dropdown float16, bfloat16, or float32

Downloads automatically from HuggingFace on first run and caches to ComfyUI/models/omnivoice/.

Output: OMNIVOICE_MODEL


OmniVoice Generate

Generates speech from text using a loaded model.

Input Type Description
model OMNIVOICE_MODEL From OmniVoice Model Loader
text string Text to synthesize (full pages supported)
mode dropdown voice_cloning, voice_design, or auto_voice
ref_audio AUDIO Reference audio for voice cloning (optional)
ref_text string Transcription of ref audio — auto-detected if blank (optional)
instruct string Voice description for voice design mode (optional)
speed float Speed multiplier — default 1.0
num_step int Diffusion steps — default 32 (use 16 for faster generation)

Output: AUDIO at 24kHz — connects directly to ComfyUI's Save Audio node.

Example Workflow (Audiobook)

[OmniVoice Model Loader] ─────────────────────────┐
                                                    ▼
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
                                        ▲
                              text = "Page 1 content..."
                              mode = voice_cloning

Repeat the Generate + Save Audio nodes for each page, reusing the same loader.

Credits

S
Description
No description provided
Readme GPL-3.0 197 KiB
Languages
Python 95.1%
JavaScript 4.9%