Commit Graph

69 Commits

Author SHA1 Message Date
Ethanfel 147030e2af fix: surface real import error when omnivoice fails to load
The broad except ImportError was swallowing the actual failure reason
(e.g. a missing transitive dep after --no-deps install). Now captures
and re-raises the original exception in the error message so users
can diagnose what is actually missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:23:59 +02:00
Ethanfel aedbe2e7d9 fix: declare speaker_1..8 in INPUT_TYPES so ComfyUI validation accepts them
Dynamic JS inputs that are not listed in INPUT_TYPES may be rejected by
ComfyUI's prompt validator and not passed to the Python function. Declaring
all 8 slots as optional fixes this while JS still controls which slots are
visible on the node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:41:56 +02:00
Ethanfel 26295e4db7 feat: auto-discover user presets from the presets folder
Drop any audio file (wav/flac/mp3/ogg/m4a) into the presets cache dir and
it will appear as "<name> (local)" in the Voice Preset dropdown on next
ComfyUI restart. Add a same-stem .txt file for the transcript.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:37:17 +02:00
Ethanfel d5f2632c48 feat: support [tag]: syntax in tagged_speakers mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:21:27 +02:00
Ethanfel 33b3d62d02 fix: tagged_speakers splits on single newlines, not just double newlines
Each line starting with [Tag] now begins a new segment so users don't need
blank lines between tagged speeches. Continuation lines (no tag) are joined
to the previous tagged segment for multi-line speeches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:16:54 +02:00
Ethanfel 95cf706b19 feat: add multi-speaker generation with JS-powered dynamic slots
- Add OmniVoiceSpeaker node (label + ref_audio + ref_text → OMNIVOICE_SPEAKER)
- Add OmniVoiceSpeakers node (roster with dynamic speaker_N inputs driven by
  num_speakers INT widget; slots expand/collapse via ComfyUI JS extension)
- Add web/multi_speaker.js: ComfyUI extension that hooks onNodeCreated and
  onConfigure to sync speaker_N inputs in real time (max 8 speakers)
- Extend OmniVoiceGenerate with optional speakers (OMNIVOICE_SPEAKERS) input;
  when connected it routes each paragraph to the assigned speaker and
  concatenates the results — supports alternate_paragraphs and tagged_speakers modes
- Remove OmniVoiceMultiSpeakerGenerate (generation now lives in the existing
  Generate node)
- Refactor generator.py: extract _write_tmp_wav helper, add _tensors_to_audio

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:08:23 +02:00
Ethanfel 3cbc04d12d wf: update default workflow from ComfyUI export
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:50:52 +02:00
Ethanfel 340c0aa402 simplify: remove language param entirely — model detects from instruct string
Chinese characters vs English words are self-identifying to the model.
No need for a separate language signal on either node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:44:27 +02:00
Ethanfel 2b13e55dc5 fix: pipe language out of Voice Design into Generate
Voice Design now outputs (instruct, language) — wire language directly
into Generate to avoid setting it in two places. Generate's language
input is now a STRING (accepts the connection or manual 'auto').

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:40:35 +02:00
Ethanfel 86ec8cf3fb bump: version 1.0.1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:33:55 +02:00
Ethanfel ae2255d9e4 fix: add missing omnivoice runtime deps for fresh installs
pydub, tensorboardx, webdataset are omnivoice dependencies that won't
be present on a clean ComfyUI install since we use --no-deps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:33:19 +02:00
Ethanfel d5a0ebeb9a revert: restore working requirements.txt
Pinning transformers==5.3.0 risks conflicting with existing ComfyUI venv.
Back to permissive >=4.40.0 which worked in practice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:29:04 +02:00
Ethanfel 0d43e5374f fix: sync requirements.txt with omnivoice's actual --no-deps dependencies
Pins transformers==5.3.0 (omnivoice requires exact version), restores
pydub (omnivoice dep), adds tensorboardx and webdataset.
Drops gradio (demo-only, not needed for inference).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:28:35 +02:00
Ethanfel 2b4b221e88 chore: remove unused pydub from requirements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:27:34 +02:00
Ethanfel 772f6654d4 feat: add language selector for voice_design + Chinese instruct support
- Generate: language dropdown (auto/English/Chinese), passed only in
  voice_design and auto_voice modes where it selects the instruct vocab
- VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns
  using the model's validated Chinese instruct vocabulary (全角逗号)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:22:25 +02:00
Ethanfel e26bac3684 remove: language parameter from Generate (model auto-detects from text)
Language is inferred from the text content — the parameter had no effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:21:27 +02:00
Ethanfel 194e0b0e09 fix: trim accent list to model-validated values only
The model's _resolve_instruct() validates against a fixed vocabulary.
Only 10 accents are supported — removed all unsupported additions.
Updated tooltip to reflect actual constraints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:17:59 +02:00
Ethanfel d4bf7c825e feat: pass instruct in voice_cloning mode for accent/style influence
If instruct is set alongside ref_audio, it is now forwarded to
model.generate() — allowing accent/style transfer on top of the
cloned voice identity. Model may or may not honour both simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:07:18 +02:00
Ethanfel d2cb5c4249 feat: expand language and accent lists to full coverage
Language: ~170 world languages with type-to-filter dropdown
Accent: 50+ regional varieties grouped by area

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:04:12 +02:00
Ethanfel c1558efad9 feat: add Voice Design node + language and guidance_scale to Generate
OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent
that compose into an instruct string — wire to Generate's instruct input.

OmniVoiceGenerate: new optional language dropdown (auto + 11 languages)
and guidance_scale (CFG, default 2.0) parameters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:02:06 +02:00
Ethanfel 97ed0f209f fix: rename node id to comfyui-omnivoice-fel (comfyui-omnivoice is taken)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:59:17 +02:00
Ethanfel f7d624799c fix: revert PublisherId to lowercase ethanfel (matches registry)
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:46:55 +02:00
Ethanfel d5000dee11 fix: correct PublisherId casing to Ethanfel
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:40:52 +02:00
Ethanfel bb1d83578c ci: trigger publish on pyproject.toml push to master (not tags)
Matches the pattern used by the working LoRA Optimizer publish workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:37:34 +02:00
Ethanfel 138df3abc4 fix: align pyproject.toml format with working registry publish
Publish to ComfyUI Registry / publish (push) Has been cancelled
- license: use { text = "GPL-3.0-only" } (matches comfy-cli expectation)
- add dependencies = []

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v1.0.0
2026-04-05 19:35:31 +02:00
Ethanfel db884341e8 fix: move repository URL to [project.urls] for comfy-cli
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:34:30 +02:00
Ethanfel dfad38a5c3 fix: correct pyproject.toml for registry publish
Publish to ComfyUI Registry / publish (push) Has been cancelled
- license: use SPDX identifier GPL-3.0-only instead of file reference
- add Repository URL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:32:52 +02:00
Ethanfel f4697ae88f wf: update default workflow from ComfyUI export
Publish to ComfyUI Registry / publish (push) Has been cancelled
Refreshed node IDs, positions and sizes from live session. Replaced
SaveAudio with PreviewAudio, added ref_text widget entry, updated
aux_id/ver properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:27:45 +02:00
Ethanfel 616c2a3e61 license: add GNU GPL v3.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:27:22 +02:00
Ethanfel 9b62c9bda8 fix: add seed control_after_generate value to default workflow
ComfyUI appends a hidden "fixed"/"randomize" value after every INT
named "seed". Without it the widget values were misaligned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:21:13 +02:00
Ethanfel 46591553f9 feat: add torch.compile option to Model Loader
Compiles the model graph on first generation (~30-60s warmup) then
speeds up all subsequent generations in the session. Recommended for
audiobook pipelines. Default off.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:16:39 +02:00
Ethanfel 2e1cba61fc fix: adjust node sizes in default workflow for seed widget
Generate node height 340→400 to fit all 6 widgets, Voice Preset
height 80→100, SaveAudio position adjusted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:12:31 +02:00
Ethanfel c22cc7c296 rename: voice_cloning.json → omnivoice_voice_cloning.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:10:51 +02:00
Ethanfel f7756e6240 ci: add pyproject.toml for ComfyUI registry
Sets publisher ID, display name, and package metadata required
for Comfy-Org/publish-node-action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:10:20 +02:00
Ethanfel 8e2367f8f0 ci: add ComfyUI registry publish workflow
Triggers on version tags (v*) using Comfy-Org/publish-node-action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:09:40 +02:00
Ethanfel 14542a6b00 docs: update README with all nodes (presets, mix voices, EPUB loader)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:07:15 +02:00
Ethanfel 76118f57c3 fix: only catch ImportError in _resample torchaudio fallback
Catching bare Exception was silently swallowing real resampling errors.
Only ImportError should trigger the interpolate fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:05:22 +02:00
Ethanfel 219c74d7ed fix: three bugs in OmniVoiceMixVoices
- _resample: squeeze batch dim before torchaudio.Resample (expected 2D)
- weight scaling: each clip now trims to natural_length*weight samples,
  dropping the broken target_per_unit double-multiplication
- empty trimmed guard: raise clear error when all weights are 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:04:54 +02:00
Ethanfel c7c7123068 feat: add OmniVoice Mix Voices node for blended speaker cloning
Concatenates 2-3 reference audio clips (with per-voice duration weights)
to create a blended speaker embedding. Merges transcripts for ref_text.
Handles mismatched sample rates and mono conversion automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:03:43 +02:00
Ethanfel f8a3bebe9c feat: add seed=42 to default workflow for voice consistency
Sets a default seed so the voice stays consistent across all generated
chunks when using the workflow as a starting point for audiobook pipelines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:58:22 +02:00
Ethanfel 8805665a22 Add seed parameter to OmniVoice Generate for consistent voice across chunks
OmniVoice chunks long text internally; each chunk is a separate diffusion
pass with different random noise, causing voice drift between paragraphs.
Setting the same seed before each generate() call anchors the RNG state
and keeps the voice consistent. seed=0 means random (default behaviour).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:53:58 +02:00
Ethanfel 4c42322c6f Expand voice presets to 8 voices (3 female, 5 male)
All transcribed via whisper-medium. Sources: Chatterbox demo GCS bucket
(ResembleAI) and F5-TTS repo (SWivid).

Female: Shadowheart, American actress, Podcast host
Male: Nature, Old Hollywood, Rick Sanchez, Stewie Griffin,
      Harvey Keitel, Conan O'Brien

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:51:43 +02:00
Ethanfel c109e860a8 Add transcript for Shadowheart preset (transcribed via whisper-medium)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:39:14 +02:00
Ethanfel 75e74075f5 Restore Shadowheart preset; user will transcribe via Whisper node
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:33:52 +02:00
Ethanfel 8de201a4c9 Add OmniVoice Voice Preset node with two female voice samples
Two built-in presets, auto-downloaded and cached to ComfyUI/models/omnivoice/presets/:
- "Nature – female, warm" (F5-TTS basic_ref_en.wav, transcript included)
- "Shadowheart – female, expressive" (Chatterbox demo, connect Whisper for transcript)

Outputs ref_audio (AUDIO) and ref_text (STRING) — wire directly into
OmniVoice Generate. Updated default workflow to use this node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:19:29 +02:00
Ethanfel d779526225 Preserve paragraph breaks in EPUB text extraction
get_text(separator=' ') collapsed all paragraphs into one line.
Now inserts \n\n at block-level element boundaries (p, h1-h6, div,
li, br, tr) before extraction, then normalises whitespace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:06:41 +02:00
Ethanfel b52edcfd84 Remove local path option from model loader
Models always download to ComfyUI/models/omnivoice/ via HuggingFace.
Local path added unnecessary complexity; users who want a custom path
can symlink into the models directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:02:55 +02:00
Ethanfel cd0f7aff07 Add default voice cloning workflow
Model Loader → Load Audio → OmniVoice Generate → Save Audio.
Connect a Whisper node to ref_text for auto-transcription.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:01:18 +02:00
Ethanfel 8d77dd6cd5 Remove torchcodec workaround; recommend Whisper node for ref_text
Users should connect a ComfyUI Whisper node to ref_text instead of
relying on omnivoice's internal ASR. Removes the error-catch workaround
and updates the tooltip accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:49:25 +02:00
Ethanfel a3fb88e559 Restore install.py for omnivoice --no-deps only
requirements.txt cannot install omnivoice (it would pull in torch==2.8.*
and break ComfyUI). install.py now does exactly one thing: install
omnivoice --no-deps, skipped if already present. All other deps remain
in requirements.txt for ComfyUI Manager to handle normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:45:24 +02:00