docs: update README with all nodes (presets, mix voices, EPUB loader)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-05 19:07:15 +02:00
parent 76118f57c3
commit 14542a6b00
+87 -12
View File
@@ -7,6 +7,9 @@ A ComfyUI custom node for [OmniVoice](https://github.com/k2-fsa/OmniVoice) — a
- **Voice Cloning** — clone any voice from a short reference audio clip
- **Voice Design** — describe a voice with text (e.g. "female, low pitch, british accent")
- **Auto Voice** — let the model pick a voice automatically
- **Voice Presets** — built-in curated reference voices, ready to use without any audio file
- **Voice Mixing** — blend two or three reference voices for a hybrid speaker
- **EPUB Loader** — load chapters from an ebook directly into the pipeline
- **Audiobook-ready** — handles arbitrarily long text with near-constant VRAM via built-in chunking
- **Multilingual** — 600+ languages
@@ -33,15 +36,13 @@ A ComfyUI custom node for [OmniVoice](https://github.com/k2-fsa/OmniVoice) — a
### OmniVoice Model Loader
Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches locally.
Loads the OmniVoice model. Downloads automatically from HuggingFace on first run and caches to `ComfyUI/models/omnivoice/`.
| Input | Type | Description |
|-------|------|-------------|
| `device` | dropdown | `cuda:0`, `cuda:1`, or `cpu` |
| `dtype` | dropdown | `float16`, `bfloat16`, or `float32` |
Downloads automatically from HuggingFace on first run and caches to `ComfyUI/models/omnivoice/`.
**Output:** `OMNIVOICE_MODEL`
---
@@ -56,25 +57,99 @@ Generates speech from text using a loaded model.
| `text` | string | Text to synthesize (full pages supported) |
| `mode` | dropdown | `voice_cloning`, `voice_design`, or `auto_voice` |
| `ref_audio` | AUDIO | Reference audio for voice cloning (optional) |
| `ref_text` | string | Transcription of ref audio — auto-detected if blank (optional) |
| `ref_text` | string | Transcription of ref audio — connect a Whisper node for best results (optional) |
| `instruct` | string | Voice description for voice design mode (optional) |
| `speed` | float | Speed multiplier — default 1.0 |
| `num_step` | int | Diffusion steps — default 32 (use 16 for faster generation) |
| `seed` | int | Diffusion seed — set the same value across all Generate nodes in an audiobook pipeline to keep the voice consistent. 0 = random |
**Output:** `AUDIO` at 24kHz — connects directly to ComfyUI's Save Audio node.
## Example Workflow (Audiobook)
---
### OmniVoice Voice Preset
Pre-fetched reference voices. Audio is downloaded once and cached to `ComfyUI/models/omnivoice/presets/`.
| Input | Type | Description |
|-------|------|-------------|
| `preset` | dropdown | Choose from built-in voices |
**Outputs:** `ref_audio` (AUDIO), `ref_text` (STRING) — wire directly into OmniVoice Generate.
Available presets:
| Name | Gender | Style |
|------|--------|-------|
| Shadowheart | Female | Expressive |
| American actress | Female | Theatrical |
| Podcast host | Female | Casual |
| Nature | Male | Warm |
| Old Hollywood | Male | Classic |
| Rick Sanchez | Male | Casual |
| Stewie Griffin | Male | Precise |
| Harvey Keitel | Male | Intense |
| Conan O'Brien | Male | Comedy |
---
### OmniVoice Mix Voices
Concatenates two or three reference audio clips to create a blended speaker. The model extracts a speaker embedding from the combined clip, producing a hybrid voice.
| Input | Type | Description |
|-------|------|-------------|
| `audio_1` | AUDIO | First reference voice (required) |
| `audio_2` | AUDIO | Second reference voice (required) |
| `weight_1` | float | Duration weight for audio_1 (0.01.0) |
| `weight_2` | float | Duration weight for audio_2 (0.01.0) |
| `audio_3` | AUDIO | Third reference voice (optional) |
| `weight_3` | float | Duration weight for audio_3 (optional) |
| `text_1/2/3` | string | Transcripts for each clip — merged into ref_text output |
**Outputs:** `ref_audio` (AUDIO), `ref_text` (STRING) — wire directly into OmniVoice Generate.
Weight controls how much of each clip's duration ends up in the mix. Equal weights (1.0 / 1.0) is a good starting point.
---
### OmniVoice EPUB Loader
Loads an EPUB file and outputs selected chapters as plain text, ready to pipe into OmniVoice Generate.
| Input | Type | Description |
|-------|------|-------------|
| `epub_path` | string | Absolute path to the `.epub` file |
| `chapter_start` | int | First chapter to include (1-indexed) |
| `chapter_end` | int | Last chapter to include (inclusive, auto-clamped) |
**Outputs:** `text` (STRING) — selected chapters joined by `---`, `chapter_list` (STRING) — numbered list of all chapter titles for reference.
## Default Workflow
A ready-to-use workflow is included at `workflows/voice_cloning.json`:
```
[OmniVoice Model Loader] ─────────────────────────┐
[Load Audio (narrator clip)] ──► [OmniVoice Generate] ──► [Save Audio]
text = "Page 1 content..."
mode = voice_cloning
[OmniVoice Model Loader] ──────────────────────────────────
[OmniVoice Voice Preset] ──► ref_audio ──► [OmniVoice Generate] ──► [Save Audio]
└──► ref_text ──►
```
Repeat the Generate + Save Audio nodes for each page, reusing the same loader.
Load it via ComfyUI → Load Workflow.
## Audiobook Pipeline
For multi-chapter audiobooks, use the same seed across all Generate nodes to keep the voice consistent between paragraphs:
```
[OmniVoice Model Loader] ──────────────────────────────────────────┐
[OmniVoice EPUB Loader] ──► chapter text ──► [OmniVoice Generate] ──► [Save Audio]
[OmniVoice Voice Preset] ──► ref_audio / ref_text ──►
seed = 42 (fixed)
```
## Credits