81 Commits

Author SHA1 Message Date
Ethanfel b4c1cb2955 chore: bump version to 1.0.6
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 22:17:21 +02:00
Ethanfel 2e3c357e5a fix: raise speed minimum from 0.1 to 0.3 to prevent VRAM explosion
Speed values below 0.3 produce noise, and 0.1 generates 10x the normal
audio length which can consume 20+ GB VRAM and freeze the system.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 22:12:04 +02:00
Ethanfel aa986fd534 fix: set transformers>=5.3.0 as required by omnivoice
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 12:14:13 +02:00
Ethanfel 3fee610050 fix: bump transformers minimum to >=4.57.0
Confirmed working version. Previous >=4.40.0 was too permissive and
older versions may lack APIs that omnivoice depends on.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 12:13:19 +02:00
Ethanfel d4638aa785 chore: bump version to 1.0.5
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 12:08:44 +02:00
Ethanfel e9c947b613 fix: strip title and heading tags from EPUB text output
Publish to ComfyUI Registry / publish (push) Has been cancelled
The chapter title was appearing multiple times in the text (from <title>,
<h1>, and body). Now <title> and <h1>/<h2>/<h3> are removed from the body
text since the title is already available via the chapter_title output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 00:07:00 +02:00
Ethanfel 197bcc554e chore: bump version to 1.0.4
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:32:03 +02:00
Ethanfel 9f2683cd54 fix: add torchcodec to requirements for torchaudio backend
Newer torchaudio versions default to the TorchCodec backend for loading
audio files. Without it installed, omnivoice fails with ImportError when
loading reference audio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:31:40 +02:00
Ethanfel f8657aca80 fix: update tests for chapter_title output and guidance_scale param
- epub loader tests: unpack 3 return values (added chapter_title)
- generator tests: expect guidance_scale in model.generate() calls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 19:06:38 +02:00
Ethanfel 0f2bcc4c0e chore: bump version to 1.0.3
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 17:41:02 +02:00
Ethanfel 4eabac4c7e feat: add chapter_title output to EPUB Loader for file naming
Returns the title of the first selected chapter as a STRING so it can
be wired directly into a Save Audio node's filename field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 17:40:38 +02:00
Ethanfel dd6d5061e4 chore: bump version to 1.0.2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:26:26 +02:00
Ethanfel 147030e2af fix: surface real import error when omnivoice fails to load
The broad except ImportError was swallowing the actual failure reason
(e.g. a missing transitive dep after --no-deps install). Now captures
and re-raises the original exception in the error message so users
can diagnose what is actually missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:23:59 +02:00
Ethanfel aedbe2e7d9 fix: declare speaker_1..8 in INPUT_TYPES so ComfyUI validation accepts them
Dynamic JS inputs that are not listed in INPUT_TYPES may be rejected by
ComfyUI's prompt validator and not passed to the Python function. Declaring
all 8 slots as optional fixes this while JS still controls which slots are
visible on the node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:41:56 +02:00
Ethanfel 26295e4db7 feat: auto-discover user presets from the presets folder
Drop any audio file (wav/flac/mp3/ogg/m4a) into the presets cache dir and
it will appear as "<name> (local)" in the Voice Preset dropdown on next
ComfyUI restart. Add a same-stem .txt file for the transcript.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:37:17 +02:00
Ethanfel d5f2632c48 feat: support [tag]: syntax in tagged_speakers mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:21:27 +02:00
Ethanfel 33b3d62d02 fix: tagged_speakers splits on single newlines, not just double newlines
Each line starting with [Tag] now begins a new segment so users don't need
blank lines between tagged speeches. Continuation lines (no tag) are joined
to the previous tagged segment for multi-line speeches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:16:54 +02:00
Ethanfel 95cf706b19 feat: add multi-speaker generation with JS-powered dynamic slots
- Add OmniVoiceSpeaker node (label + ref_audio + ref_text → OMNIVOICE_SPEAKER)
- Add OmniVoiceSpeakers node (roster with dynamic speaker_N inputs driven by
  num_speakers INT widget; slots expand/collapse via ComfyUI JS extension)
- Add web/multi_speaker.js: ComfyUI extension that hooks onNodeCreated and
  onConfigure to sync speaker_N inputs in real time (max 8 speakers)
- Extend OmniVoiceGenerate with optional speakers (OMNIVOICE_SPEAKERS) input;
  when connected it routes each paragraph to the assigned speaker and
  concatenates the results — supports alternate_paragraphs and tagged_speakers modes
- Remove OmniVoiceMultiSpeakerGenerate (generation now lives in the existing
  Generate node)
- Refactor generator.py: extract _write_tmp_wav helper, add _tensors_to_audio

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:08:23 +02:00
Ethanfel 3cbc04d12d wf: update default workflow from ComfyUI export
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:50:52 +02:00
Ethanfel 340c0aa402 simplify: remove language param entirely — model detects from instruct string
Chinese characters vs English words are self-identifying to the model.
No need for a separate language signal on either node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:44:27 +02:00
Ethanfel 2b13e55dc5 fix: pipe language out of Voice Design into Generate
Voice Design now outputs (instruct, language) — wire language directly
into Generate to avoid setting it in two places. Generate's language
input is now a STRING (accepts the connection or manual 'auto').

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:40:35 +02:00
Ethanfel 86ec8cf3fb bump: version 1.0.1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:33:55 +02:00
Ethanfel ae2255d9e4 fix: add missing omnivoice runtime deps for fresh installs
pydub, tensorboardx, webdataset are omnivoice dependencies that won't
be present on a clean ComfyUI install since we use --no-deps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:33:19 +02:00
Ethanfel d5a0ebeb9a revert: restore working requirements.txt
Pinning transformers==5.3.0 risks conflicting with existing ComfyUI venv.
Back to permissive >=4.40.0 which worked in practice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:29:04 +02:00
Ethanfel 0d43e5374f fix: sync requirements.txt with omnivoice's actual --no-deps dependencies
Pins transformers==5.3.0 (omnivoice requires exact version), restores
pydub (omnivoice dep), adds tensorboardx and webdataset.
Drops gradio (demo-only, not needed for inference).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:28:35 +02:00
Ethanfel 2b4b221e88 chore: remove unused pydub from requirements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:27:34 +02:00
Ethanfel 772f6654d4 feat: add language selector for voice_design + Chinese instruct support
- Generate: language dropdown (auto/English/Chinese), passed only in
  voice_design and auto_voice modes where it selects the instruct vocab
- VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns
  using the model's validated Chinese instruct vocabulary (全角逗号)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:22:25 +02:00
Ethanfel e26bac3684 remove: language parameter from Generate (model auto-detects from text)
Language is inferred from the text content — the parameter had no effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:21:27 +02:00
Ethanfel 194e0b0e09 fix: trim accent list to model-validated values only
The model's _resolve_instruct() validates against a fixed vocabulary.
Only 10 accents are supported — removed all unsupported additions.
Updated tooltip to reflect actual constraints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:17:59 +02:00
Ethanfel d4bf7c825e feat: pass instruct in voice_cloning mode for accent/style influence
If instruct is set alongside ref_audio, it is now forwarded to
model.generate() — allowing accent/style transfer on top of the
cloned voice identity. Model may or may not honour both simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:07:18 +02:00
Ethanfel d2cb5c4249 feat: expand language and accent lists to full coverage
Language: ~170 world languages with type-to-filter dropdown
Accent: 50+ regional varieties grouped by area

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:04:12 +02:00
Ethanfel c1558efad9 feat: add Voice Design node + language and guidance_scale to Generate
OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent
that compose into an instruct string — wire to Generate's instruct input.

OmniVoiceGenerate: new optional language dropdown (auto + 11 languages)
and guidance_scale (CFG, default 2.0) parameters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:02:06 +02:00
Ethanfel 97ed0f209f fix: rename node id to comfyui-omnivoice-fel (comfyui-omnivoice is taken)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:59:17 +02:00
Ethanfel f7d624799c fix: revert PublisherId to lowercase ethanfel (matches registry)
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:46:55 +02:00
Ethanfel d5000dee11 fix: correct PublisherId casing to Ethanfel
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:40:52 +02:00
Ethanfel bb1d83578c ci: trigger publish on pyproject.toml push to master (not tags)
Matches the pattern used by the working LoRA Optimizer publish workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:37:34 +02:00
Ethanfel 138df3abc4 fix: align pyproject.toml format with working registry publish
Publish to ComfyUI Registry / publish (push) Has been cancelled
- license: use { text = "GPL-3.0-only" } (matches comfy-cli expectation)
- add dependencies = []

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v1.0.0
2026-04-05 19:35:31 +02:00
Ethanfel db884341e8 fix: move repository URL to [project.urls] for comfy-cli
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:34:30 +02:00
Ethanfel dfad38a5c3 fix: correct pyproject.toml for registry publish
Publish to ComfyUI Registry / publish (push) Has been cancelled
- license: use SPDX identifier GPL-3.0-only instead of file reference
- add Repository URL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:32:52 +02:00
Ethanfel f4697ae88f wf: update default workflow from ComfyUI export
Publish to ComfyUI Registry / publish (push) Has been cancelled
Refreshed node IDs, positions and sizes from live session. Replaced
SaveAudio with PreviewAudio, added ref_text widget entry, updated
aux_id/ver properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:27:45 +02:00
Ethanfel 616c2a3e61 license: add GNU GPL v3.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:27:22 +02:00
Ethanfel 9b62c9bda8 fix: add seed control_after_generate value to default workflow
ComfyUI appends a hidden "fixed"/"randomize" value after every INT
named "seed". Without it the widget values were misaligned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:21:13 +02:00
Ethanfel 46591553f9 feat: add torch.compile option to Model Loader
Compiles the model graph on first generation (~30-60s warmup) then
speeds up all subsequent generations in the session. Recommended for
audiobook pipelines. Default off.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:16:39 +02:00
Ethanfel 2e1cba61fc fix: adjust node sizes in default workflow for seed widget
Generate node height 340→400 to fit all 6 widgets, Voice Preset
height 80→100, SaveAudio position adjusted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:12:31 +02:00
Ethanfel c22cc7c296 rename: voice_cloning.json → omnivoice_voice_cloning.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:10:51 +02:00
Ethanfel f7756e6240 ci: add pyproject.toml for ComfyUI registry
Sets publisher ID, display name, and package metadata required
for Comfy-Org/publish-node-action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:10:20 +02:00
Ethanfel 8e2367f8f0 ci: add ComfyUI registry publish workflow
Triggers on version tags (v*) using Comfy-Org/publish-node-action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:09:40 +02:00
Ethanfel 14542a6b00 docs: update README with all nodes (presets, mix voices, EPUB loader)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:07:15 +02:00
Ethanfel 76118f57c3 fix: only catch ImportError in _resample torchaudio fallback
Catching bare Exception was silently swallowing real resampling errors.
Only ImportError should trigger the interpolate fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:05:22 +02:00
Ethanfel 219c74d7ed fix: three bugs in OmniVoiceMixVoices
- _resample: squeeze batch dim before torchaudio.Resample (expected 2D)
- weight scaling: each clip now trims to natural_length*weight samples,
  dropping the broken target_per_unit double-multiplication
- empty trimmed guard: raise clear error when all weights are 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:04:54 +02:00