From 14542a6b00059d100b8df58e2af32a8da7ab7782 Mon Sep 17 00:00:00 2001 From: Ethanfel Date: Sun, 5 Apr 2026 19:07:15 +0200 Subject: [PATCH] docs: update README with all nodes (presets, mix voices, EPUB loader) Co-Authored-By: Claude Sonnet 4.6 --- README.md | 99 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 87 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 1ae7280..12163b6 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,9 @@ A ComfyUI custom node for [OmniVoice](https://github.com/k2-fsa/OmniVoice) — a - **Voice Cloning** — clone any voice from a short reference audio clip - **Voice Design** — describe a voice with text (e.g. "female, low pitch, british accent") - **Auto Voice** — let the model pick a voice automatically +- **Voice Presets** — built-in curated reference voices, ready to use without any audio file +- **Voice Mixing** — blend two or three reference voices for a hybrid speaker +- **EPUB Loader** — load chapters from an ebook directly into the pipeline - **Audiobook-ready** — handles arbitrarily long text with near-constant VRAM via built-in chunking - **Multilingual** — 600+ languages @@ -33,15 +36,13 @@ A ComfyUI custom node for [OmniVoice](https://github.com/k2-fsa/OmniVoice) — a ### OmniVoice Model Loader -Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally. +Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches to `ComfyUI/models/omnivoice/`. | Input | Type | Description | |-------|------|-------------| | `device` | dropdown | `cuda:0`, `cuda:1`, or `cpu` | | `dtype` | dropdown | `float16`, `bfloat16`, or `float32` | -Downloads automatically from HuggingFace on first run and caches to `ComfyUI/models/omnivoice/`. - **Output:** `OMNIVOICE_MODEL` --- @@ -56,25 +57,99 @@ Generates speech from text using a loaded model. | `text` | string | Text to synthesize (full pages supported) | | `mode` | dropdown | `voice_cloning`, `voice_design`, or `auto_voice` | | `ref_audio` | AUDIO | Reference audio for voice cloning (optional) | -| `ref_text` | string | Transcription of ref audio — auto-detected if blank (optional) | +| `ref_text` | string | Transcription of ref audio — connect a Whisper node for best results (optional) | | `instruct` | string | Voice description for voice design mode (optional) | | `speed` | float | Speed multiplier — default 1.0 | | `num_step` | int | Diffusion steps — default 32 (use 16 for faster generation) | +| `seed` | int | Diffusion seed — set the same value across all Generate nodes in an audiobook pipeline to keep the voice consistent. 0 = random | **Output:** `AUDIO` at 24kHz — connects directly to ComfyUI's Save Audio node. -## Example Workflow (Audiobook) +--- + +### OmniVoice Voice Preset + +Pre-fetched reference voices. Audio is downloaded once and cached to `ComfyUI/models/omnivoice/presets/`. + +| Input | Type | Description | +|-------|------|-------------| +| `preset` | dropdown | Choose from built-in voices | + +**Outputs:** `ref_audio` (AUDIO), `ref_text` (STRING) — wire directly into OmniVoice Generate. + +Available presets: + +| Name | Gender | Style | +|------|--------|-------| +| Shadowheart | Female | Expressive | +| American actress | Female | Theatrical | +| Podcast host | Female | Casual | +| Nature | Male | Warm | +| Old Hollywood | Male | Classic | +| Rick Sanchez | Male | Casual | +| Stewie Griffin | Male | Precise | +| Harvey Keitel | Male | Intense | +| Conan O'Brien | Male | Comedy | + +--- + +### OmniVoice Mix Voices + +Concatenates two or three reference audio clips to create a blended speaker. The model extracts a speaker embedding from the combined clip, producing a hybrid voice. + +| Input | Type | Description | +|-------|------|-------------| +| `audio_1` | AUDIO | First reference voice (required) | +| `audio_2` | AUDIO | Second reference voice (required) | +| `weight_1` | float | Duration weight for audio_1 (0.0–1.0) | +| `weight_2` | float | Duration weight for audio_2 (0.0–1.0) | +| `audio_3` | AUDIO | Third reference voice (optional) | +| `weight_3` | float | Duration weight for audio_3 (optional) | +| `text_1/2/3` | string | Transcripts for each clip — merged into ref_text output | + +**Outputs:** `ref_audio` (AUDIO), `ref_text` (STRING) — wire directly into OmniVoice Generate. + +Weight controls how much of each clip's duration ends up in the mix. Equal weights (1.0 / 1.0) is a good starting point. + +--- + +### OmniVoice EPUB Loader + +Loads an EPUB file and outputs selected chapters as plain text, ready to pipe into OmniVoice Generate. + +| Input | Type | Description | +|-------|------|-------------| +| `epub_path` | string | Absolute path to the `.epub` file | +| `chapter_start` | int | First chapter to include (1-indexed) | +| `chapter_end` | int | Last chapter to include (inclusive, auto-clamped) | + +**Outputs:** `text` (STRING) — selected chapters joined by `---`, `chapter_list` (STRING) — numbered list of all chapter titles for reference. + +## Default Workflow + +A ready-to-use workflow is included at `workflows/voice_cloning.json`: ``` -[OmniVoice Model Loader] ─────────────────────────┐ - ▼ -[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio] - ▲ - text = "Page 1 content..." - mode = voice_cloning +[OmniVoice Model Loader] ──────────────────────────────────┐ + ▼ +[OmniVoice Voice Preset] ──► ref_audio ──► [OmniVoice Generate] ──► [Save Audio] + └──► ref_text ──► ``` -Repeat the Generate + Save Audio nodes for each page, reusing the same loader. +Load it via ComfyUI → Load Workflow. + +## Audiobook Pipeline + +For multi-chapter audiobooks, use the same seed across all Generate nodes to keep the voice consistent between paragraphs: + +``` +[OmniVoice Model Loader] ──────────────────────────────────────────┐ + ▼ +[OmniVoice EPUB Loader] ──► chapter text ──► [OmniVoice Generate] ──► [Save Audio] + ▲ +[OmniVoice Voice Preset] ──► ref_audio / ref_text ──► + seed = 42 (fixed) +``` ## Credits