Commit Graph

61 Commits

Author SHA1 Message Date
Ethanfel 2b13e55dc5 fix: pipe language out of Voice Design into Generate
Voice Design now outputs (instruct, language) — wire language directly
into Generate to avoid setting it in two places. Generate's language
input is now a STRING (accepts the connection or manual 'auto').

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:40:35 +02:00
Ethanfel 86ec8cf3fb bump: version 1.0.1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:33:55 +02:00
Ethanfel ae2255d9e4 fix: add missing omnivoice runtime deps for fresh installs
pydub, tensorboardx, webdataset are omnivoice dependencies that won't
be present on a clean ComfyUI install since we use --no-deps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:33:19 +02:00
Ethanfel d5a0ebeb9a revert: restore working requirements.txt
Pinning transformers==5.3.0 risks conflicting with existing ComfyUI venv.
Back to permissive >=4.40.0 which worked in practice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:29:04 +02:00
Ethanfel 0d43e5374f fix: sync requirements.txt with omnivoice's actual --no-deps dependencies
Pins transformers==5.3.0 (omnivoice requires exact version), restores
pydub (omnivoice dep), adds tensorboardx and webdataset.
Drops gradio (demo-only, not needed for inference).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:28:35 +02:00
Ethanfel 2b4b221e88 chore: remove unused pydub from requirements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:27:34 +02:00
Ethanfel 772f6654d4 feat: add language selector for voice_design + Chinese instruct support
- Generate: language dropdown (auto/English/Chinese), passed only in
  voice_design and auto_voice modes where it selects the instruct vocab
- VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns
  using the model's validated Chinese instruct vocabulary (全角逗号)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:22:25 +02:00
Ethanfel e26bac3684 remove: language parameter from Generate (model auto-detects from text)
Language is inferred from the text content — the parameter had no effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:21:27 +02:00
Ethanfel 194e0b0e09 fix: trim accent list to model-validated values only
The model's _resolve_instruct() validates against a fixed vocabulary.
Only 10 accents are supported — removed all unsupported additions.
Updated tooltip to reflect actual constraints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:17:59 +02:00
Ethanfel d4bf7c825e feat: pass instruct in voice_cloning mode for accent/style influence
If instruct is set alongside ref_audio, it is now forwarded to
model.generate() — allowing accent/style transfer on top of the
cloned voice identity. Model may or may not honour both simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:07:18 +02:00
Ethanfel d2cb5c4249 feat: expand language and accent lists to full coverage
Language: ~170 world languages with type-to-filter dropdown
Accent: 50+ regional varieties grouped by area

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:04:12 +02:00
Ethanfel c1558efad9 feat: add Voice Design node + language and guidance_scale to Generate
OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent
that compose into an instruct string — wire to Generate's instruct input.

OmniVoiceGenerate: new optional language dropdown (auto + 11 languages)
and guidance_scale (CFG, default 2.0) parameters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:02:06 +02:00
Ethanfel 97ed0f209f fix: rename node id to comfyui-omnivoice-fel (comfyui-omnivoice is taken)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:59:17 +02:00
Ethanfel f7d624799c fix: revert PublisherId to lowercase ethanfel (matches registry)
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:46:55 +02:00
Ethanfel d5000dee11 fix: correct PublisherId casing to Ethanfel
Publish to ComfyUI Registry / publish (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:40:52 +02:00
Ethanfel bb1d83578c ci: trigger publish on pyproject.toml push to master (not tags)
Matches the pattern used by the working LoRA Optimizer publish workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:37:34 +02:00
Ethanfel 138df3abc4 fix: align pyproject.toml format with working registry publish
Publish to ComfyUI Registry / publish (push) Has been cancelled
- license: use { text = "GPL-3.0-only" } (matches comfy-cli expectation)
- add dependencies = []

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v1.0.0
2026-04-05 19:35:31 +02:00
Ethanfel db884341e8 fix: move repository URL to [project.urls] for comfy-cli
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:34:30 +02:00
Ethanfel dfad38a5c3 fix: correct pyproject.toml for registry publish
Publish to ComfyUI Registry / publish (push) Has been cancelled
- license: use SPDX identifier GPL-3.0-only instead of file reference
- add Repository URL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:32:52 +02:00
Ethanfel f4697ae88f wf: update default workflow from ComfyUI export
Publish to ComfyUI Registry / publish (push) Has been cancelled
Refreshed node IDs, positions and sizes from live session. Replaced
SaveAudio with PreviewAudio, added ref_text widget entry, updated
aux_id/ver properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:27:45 +02:00
Ethanfel 616c2a3e61 license: add GNU GPL v3.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:27:22 +02:00
Ethanfel 9b62c9bda8 fix: add seed control_after_generate value to default workflow
ComfyUI appends a hidden "fixed"/"randomize" value after every INT
named "seed". Without it the widget values were misaligned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:21:13 +02:00
Ethanfel 46591553f9 feat: add torch.compile option to Model Loader
Compiles the model graph on first generation (~30-60s warmup) then
speeds up all subsequent generations in the session. Recommended for
audiobook pipelines. Default off.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:16:39 +02:00
Ethanfel 2e1cba61fc fix: adjust node sizes in default workflow for seed widget
Generate node height 340→400 to fit all 6 widgets, Voice Preset
height 80→100, SaveAudio position adjusted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:12:31 +02:00
Ethanfel c22cc7c296 rename: voice_cloning.json → omnivoice_voice_cloning.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:10:51 +02:00
Ethanfel f7756e6240 ci: add pyproject.toml for ComfyUI registry
Sets publisher ID, display name, and package metadata required
for Comfy-Org/publish-node-action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:10:20 +02:00
Ethanfel 8e2367f8f0 ci: add ComfyUI registry publish workflow
Triggers on version tags (v*) using Comfy-Org/publish-node-action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:09:40 +02:00
Ethanfel 14542a6b00 docs: update README with all nodes (presets, mix voices, EPUB loader)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:07:15 +02:00
Ethanfel 76118f57c3 fix: only catch ImportError in _resample torchaudio fallback
Catching bare Exception was silently swallowing real resampling errors.
Only ImportError should trigger the interpolate fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:05:22 +02:00
Ethanfel 219c74d7ed fix: three bugs in OmniVoiceMixVoices
- _resample: squeeze batch dim before torchaudio.Resample (expected 2D)
- weight scaling: each clip now trims to natural_length*weight samples,
  dropping the broken target_per_unit double-multiplication
- empty trimmed guard: raise clear error when all weights are 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:04:54 +02:00
Ethanfel c7c7123068 feat: add OmniVoice Mix Voices node for blended speaker cloning
Concatenates 2-3 reference audio clips (with per-voice duration weights)
to create a blended speaker embedding. Merges transcripts for ref_text.
Handles mismatched sample rates and mono conversion automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:03:43 +02:00
Ethanfel f8a3bebe9c feat: add seed=42 to default workflow for voice consistency
Sets a default seed so the voice stays consistent across all generated
chunks when using the workflow as a starting point for audiobook pipelines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:58:22 +02:00
Ethanfel 8805665a22 Add seed parameter to OmniVoice Generate for consistent voice across chunks
OmniVoice chunks long text internally; each chunk is a separate diffusion
pass with different random noise, causing voice drift between paragraphs.
Setting the same seed before each generate() call anchors the RNG state
and keeps the voice consistent. seed=0 means random (default behaviour).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:53:58 +02:00
Ethanfel 4c42322c6f Expand voice presets to 8 voices (3 female, 5 male)
All transcribed via whisper-medium. Sources: Chatterbox demo GCS bucket
(ResembleAI) and F5-TTS repo (SWivid).

Female: Shadowheart, American actress, Podcast host
Male: Nature, Old Hollywood, Rick Sanchez, Stewie Griffin,
      Harvey Keitel, Conan O'Brien

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:51:43 +02:00
Ethanfel c109e860a8 Add transcript for Shadowheart preset (transcribed via whisper-medium)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:39:14 +02:00
Ethanfel 75e74075f5 Restore Shadowheart preset; user will transcribe via Whisper node
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:33:52 +02:00
Ethanfel 8de201a4c9 Add OmniVoice Voice Preset node with two female voice samples
Two built-in presets, auto-downloaded and cached to ComfyUI/models/omnivoice/presets/:
- "Nature – female, warm" (F5-TTS basic_ref_en.wav, transcript included)
- "Shadowheart – female, expressive" (Chatterbox demo, connect Whisper for transcript)

Outputs ref_audio (AUDIO) and ref_text (STRING) — wire directly into
OmniVoice Generate. Updated default workflow to use this node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:19:29 +02:00
Ethanfel d779526225 Preserve paragraph breaks in EPUB text extraction
get_text(separator=' ') collapsed all paragraphs into one line.
Now inserts \n\n at block-level element boundaries (p, h1-h6, div,
li, br, tr) before extraction, then normalises whitespace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:06:41 +02:00
Ethanfel b52edcfd84 Remove local path option from model loader
Models always download to ComfyUI/models/omnivoice/ via HuggingFace.
Local path added unnecessary complexity; users who want a custom path
can symlink into the models directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:02:55 +02:00
Ethanfel cd0f7aff07 Add default voice cloning workflow
Model Loader → Load Audio → OmniVoice Generate → Save Audio.
Connect a Whisper node to ref_text for auto-transcription.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:01:18 +02:00
Ethanfel 8d77dd6cd5 Remove torchcodec workaround; recommend Whisper node for ref_text
Users should connect a ComfyUI Whisper node to ref_text instead of
relying on omnivoice's internal ASR. Removes the error-catch workaround
and updates the tooltip accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:49:25 +02:00
Ethanfel a3fb88e559 Restore install.py for omnivoice --no-deps only
requirements.txt cannot install omnivoice (it would pull in torch==2.8.*
and break ComfyUI). install.py now does exactly one thing: install
omnivoice --no-deps, skipped if already present. All other deps remain
in requirements.txt for ComfyUI Manager to handle normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:45:24 +02:00
Ethanfel dbb3207df1 Replace install.py with standard requirements.txt
install.py was running arbitrary pip installs as part of node loading,
which is dangerous in a shared venv. Standard approach: requirements.txt
lists the safe deps (transformers, accelerate, soundfile, etc.);
omnivoice itself must be installed once manually with --no-deps to avoid
overwriting ComfyUI's torch. README documents this clearly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:44:52 +02:00
Ethanfel e8e8943692 Remove transformers upper bound cap from install.py
The cap was wrong — it would downgrade transformers in shared venvs and
break other nodes. The torchcodec issue is handled in code now.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:36:06 +02:00
Ethanfel 30f46fc3ef Revert transformers cap; catch torchcodec ASR failure with clear message
install.py: restore transformers>=5.0.0 (capping it would break other nodes).
generator.py: catch the torchcodec RuntimeError that fires when ref_text is
blank and transformers 5.x auto-transcription requires missing FFmpeg libs.
Raises a human-readable error telling the user to fill in ref_text manually.
Also updates the ref_text tooltip to recommend providing it explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:35:54 +02:00
Ethanfel d6ff42dc7c Cap transformers below 5.0 to avoid torchcodec ASR crash
transformers 5.x unconditionally imports torchcodec in its ASR pipeline
preprocess step, which crashes in environments without FFmpeg shared libs.
4.x does not have this dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:32:50 +02:00
Ethanfel 5dfaa0b300 Replace torchaudio.save with soundfile.write; add EPUB loader node
- nodes/generator.py: swap torchaudio.save for soundfile.write to avoid
  torchcodec/FFmpeg dependency crash in environments without FFmpeg shared libs
- nodes/epub_loader.py: new OmniVoiceEpubLoader node for loading EPUB chapters
- tests/test_epub_loader.py: 8 tests for the EPUB loader
- install.py: add beautifulsoup4 to runtime deps
- __init__.py, nodes/__init__.py: register OmniVoiceEpubLoader

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:24:18 +02:00
Ethanfel 5366f6992e feat: add tooltips with inline tag reference to generator node inputs 2026-04-05 15:35:21 +02:00
Ethanfel f647a24988 fix: add install.py to prevent omnivoice from overwriting ComfyUI's torch 2026-04-05 14:53:33 +02:00
Ethanfel b273f8f2d7 refactor: remove redundant condition and rename shadowed waveform variable 2026-04-05 10:41:08 +02:00