Commit Graph

29 Commits

Author SHA1 Message Date
Ethanfel e26bac3684 remove: language parameter from Generate (model auto-detects from text)
Language is inferred from the text content — the parameter had no effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:21:27 +02:00
Ethanfel 194e0b0e09 fix: trim accent list to model-validated values only
The model's _resolve_instruct() validates against a fixed vocabulary.
Only 10 accents are supported — removed all unsupported additions.
Updated tooltip to reflect actual constraints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:17:59 +02:00
Ethanfel d4bf7c825e feat: pass instruct in voice_cloning mode for accent/style influence
If instruct is set alongside ref_audio, it is now forwarded to
model.generate() — allowing accent/style transfer on top of the
cloned voice identity. Model may or may not honour both simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:07:18 +02:00
Ethanfel d2cb5c4249 feat: expand language and accent lists to full coverage
Language: ~170 world languages with type-to-filter dropdown
Accent: 50+ regional varieties grouped by area

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:04:12 +02:00
Ethanfel c1558efad9 feat: add Voice Design node + language and guidance_scale to Generate
OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent
that compose into an instruct string — wire to Generate's instruct input.

OmniVoiceGenerate: new optional language dropdown (auto + 11 languages)
and guidance_scale (CFG, default 2.0) parameters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:02:06 +02:00
Ethanfel 46591553f9 feat: add torch.compile option to Model Loader
Compiles the model graph on first generation (~30-60s warmup) then
speeds up all subsequent generations in the session. Recommended for
audiobook pipelines. Default off.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:16:39 +02:00
Ethanfel 76118f57c3 fix: only catch ImportError in _resample torchaudio fallback
Catching bare Exception was silently swallowing real resampling errors.
Only ImportError should trigger the interpolate fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:05:22 +02:00
Ethanfel 219c74d7ed fix: three bugs in OmniVoiceMixVoices
- _resample: squeeze batch dim before torchaudio.Resample (expected 2D)
- weight scaling: each clip now trims to natural_length*weight samples,
  dropping the broken target_per_unit double-multiplication
- empty trimmed guard: raise clear error when all weights are 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:04:54 +02:00
Ethanfel c7c7123068 feat: add OmniVoice Mix Voices node for blended speaker cloning
Concatenates 2-3 reference audio clips (with per-voice duration weights)
to create a blended speaker embedding. Merges transcripts for ref_text.
Handles mismatched sample rates and mono conversion automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 19:03:43 +02:00
Ethanfel 8805665a22 Add seed parameter to OmniVoice Generate for consistent voice across chunks
OmniVoice chunks long text internally; each chunk is a separate diffusion
pass with different random noise, causing voice drift between paragraphs.
Setting the same seed before each generate() call anchors the RNG state
and keeps the voice consistent. seed=0 means random (default behaviour).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:53:58 +02:00
Ethanfel 4c42322c6f Expand voice presets to 8 voices (3 female, 5 male)
All transcribed via whisper-medium. Sources: Chatterbox demo GCS bucket
(ResembleAI) and F5-TTS repo (SWivid).

Female: Shadowheart, American actress, Podcast host
Male: Nature, Old Hollywood, Rick Sanchez, Stewie Griffin,
      Harvey Keitel, Conan O'Brien

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:51:43 +02:00
Ethanfel c109e860a8 Add transcript for Shadowheart preset (transcribed via whisper-medium)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:39:14 +02:00
Ethanfel 75e74075f5 Restore Shadowheart preset; user will transcribe via Whisper node
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:33:52 +02:00
Ethanfel 8de201a4c9 Add OmniVoice Voice Preset node with two female voice samples
Two built-in presets, auto-downloaded and cached to ComfyUI/models/omnivoice/presets/:
- "Nature – female, warm" (F5-TTS basic_ref_en.wav, transcript included)
- "Shadowheart – female, expressive" (Chatterbox demo, connect Whisper for transcript)

Outputs ref_audio (AUDIO) and ref_text (STRING) — wire directly into
OmniVoice Generate. Updated default workflow to use this node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:19:29 +02:00
Ethanfel d779526225 Preserve paragraph breaks in EPUB text extraction
get_text(separator=' ') collapsed all paragraphs into one line.
Now inserts \n\n at block-level element boundaries (p, h1-h6, div,
li, br, tr) before extraction, then normalises whitespace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:06:41 +02:00
Ethanfel b52edcfd84 Remove local path option from model loader
Models always download to ComfyUI/models/omnivoice/ via HuggingFace.
Local path added unnecessary complexity; users who want a custom path
can symlink into the models directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:02:55 +02:00
Ethanfel 8d77dd6cd5 Remove torchcodec workaround; recommend Whisper node for ref_text
Users should connect a ComfyUI Whisper node to ref_text instead of
relying on omnivoice's internal ASR. Removes the error-catch workaround
and updates the tooltip accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:49:25 +02:00
Ethanfel 30f46fc3ef Revert transformers cap; catch torchcodec ASR failure with clear message
install.py: restore transformers>=5.0.0 (capping it would break other nodes).
generator.py: catch the torchcodec RuntimeError that fires when ref_text is
blank and transformers 5.x auto-transcription requires missing FFmpeg libs.
Raises a human-readable error telling the user to fill in ref_text manually.
Also updates the ref_text tooltip to recommend providing it explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:35:54 +02:00
Ethanfel 5dfaa0b300 Replace torchaudio.save with soundfile.write; add EPUB loader node
- nodes/generator.py: swap torchaudio.save for soundfile.write to avoid
  torchcodec/FFmpeg dependency crash in environments without FFmpeg shared libs
- nodes/epub_loader.py: new OmniVoiceEpubLoader node for loading EPUB chapters
- tests/test_epub_loader.py: 8 tests for the EPUB loader
- install.py: add beautifulsoup4 to runtime deps
- __init__.py, nodes/__init__.py: register OmniVoiceEpubLoader

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:24:18 +02:00
Ethanfel 5366f6992e feat: add tooltips with inline tag reference to generator node inputs 2026-04-05 15:35:21 +02:00
Ethanfel b273f8f2d7 refactor: remove redundant condition and rename shadowed waveform variable 2026-04-05 10:41:08 +02:00
Ethanfel 0ffd624471 fix: protect os.unlink in finally block from masking original exceptions 2026-04-05 10:38:38 +02:00
Ethanfel a2c542a2bc fix: move output waveform to CPU and cast sample_rate to int 2026-04-05 10:34:53 +02:00
Ethanfel 808580b771 fix: guard omnivoice import in loader.py so node classes are importable without package
Wrap `from omnivoice import OmniVoice` in a try/except ImportError, setting
OmniVoice=None when absent. Add a clear runtime ImportError in load_model()
so users get an actionable message. Allows `from nodes.loader import
OmniVoiceModelLoader` to succeed outside of pytest (where conftest.py injects
the mock) while keeping all 13 tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:11:09 +02:00
Ethanfel 18fe6359cf fix: add input validation and cpu() guard in OmniVoiceGenerate
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:09:52 +02:00
Ethanfel 95712e5504 feat: add OmniVoiceGenerate node with voice cloning, design, and auto modes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:07:20 +02:00
Ethanfel 11beba1c47 fix: clean up omnivoice import guard and __init__ error masking
Remove OmniVoice = None fallback in nodes/loader.py so missing omnivoice
gives a clear ImportError instead of a confusing AttributeError. Restore
__init__.py to clean form without the try/except that silently swallowed
real import errors. Add omnivoice mock to conftest.py and register a
pytest plugin that prevents pytest from treating the project root as a
Package node (which would try to import __init__.py outside a package
context and fail on the relative import).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:03:34 +02:00
Ethanfel 069169485d feat: add OmniVoiceModelLoader node
Implements OmniVoiceModelLoader with INPUT_TYPES, RETURN_TYPES, and
load_model supporting both HuggingFace auto-download and local path
sources. Adds TDD test suite and pytest infrastructure (conftest.py,
pytest.ini) to enable testing outside ComfyUI without omnivoice installed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 08:52:26 +02:00
Ethanfel 0ed43a83ca feat: scaffold ComfyUI-Omnivoice node package 2026-04-05 08:43:17 +02:00