Commit Graph

23 Commits

Author SHA1 Message Date
Ethanfel 2e3c357e5a fix: raise speed minimum from 0.1 to 0.3 to prevent VRAM explosion
Speed values below 0.3 produce noise, and 0.1 generates 10x the normal
audio length which can consume 20+ GB VRAM and freeze the system.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 22:12:04 +02:00
Ethanfel d5f2632c48 feat: support [tag]: syntax in tagged_speakers mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:21:27 +02:00
Ethanfel 33b3d62d02 fix: tagged_speakers splits on single newlines, not just double newlines
Each line starting with [Tag] now begins a new segment so users don't need
blank lines between tagged speeches. Continuation lines (no tag) are joined
to the previous tagged segment for multi-line speeches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:16:54 +02:00
Ethanfel 95cf706b19 feat: add multi-speaker generation with JS-powered dynamic slots
- Add OmniVoiceSpeaker node (label + ref_audio + ref_text → OMNIVOICE_SPEAKER)
- Add OmniVoiceSpeakers node (roster with dynamic speaker_N inputs driven by
  num_speakers INT widget; slots expand/collapse via ComfyUI JS extension)
- Add web/multi_speaker.js: ComfyUI extension that hooks onNodeCreated and
  onConfigure to sync speaker_N inputs in real time (max 8 speakers)
- Extend OmniVoiceGenerate with optional speakers (OMNIVOICE_SPEAKERS) input;
  when connected it routes each paragraph to the assigned speaker and
  concatenates the results — supports alternate_paragraphs and tagged_speakers modes
- Remove OmniVoiceMultiSpeakerGenerate (generation now lives in the existing
  Generate node)
- Refactor generator.py: extract _write_tmp_wav helper, add _tensors_to_audio

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 09:08:23 +02:00
Ethanfel 340c0aa402 simplify: remove language param entirely — model detects from instruct string
Chinese characters vs English words are self-identifying to the model.
No need for a separate language signal on either node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:44:27 +02:00
Ethanfel 2b13e55dc5 fix: pipe language out of Voice Design into Generate
Voice Design now outputs (instruct, language) — wire language directly
into Generate to avoid setting it in two places. Generate's language
input is now a STRING (accepts the connection or manual 'auto').

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:40:35 +02:00
Ethanfel 772f6654d4 feat: add language selector for voice_design + Chinese instruct support
- Generate: language dropdown (auto/English/Chinese), passed only in
  voice_design and auto_voice modes where it selects the instruct vocab
- VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns
  using the model's validated Chinese instruct vocabulary (全角逗号)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:22:25 +02:00
Ethanfel e26bac3684 remove: language parameter from Generate (model auto-detects from text)
Language is inferred from the text content — the parameter had no effect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:21:27 +02:00
Ethanfel 194e0b0e09 fix: trim accent list to model-validated values only
The model's _resolve_instruct() validates against a fixed vocabulary.
Only 10 accents are supported — removed all unsupported additions.
Updated tooltip to reflect actual constraints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:17:59 +02:00
Ethanfel d4bf7c825e feat: pass instruct in voice_cloning mode for accent/style influence
If instruct is set alongside ref_audio, it is now forwarded to
model.generate() — allowing accent/style transfer on top of the
cloned voice identity. Model may or may not honour both simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:07:18 +02:00
Ethanfel d2cb5c4249 feat: expand language and accent lists to full coverage
Language: ~170 world languages with type-to-filter dropdown
Accent: 50+ regional varieties grouped by area

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:04:12 +02:00
Ethanfel c1558efad9 feat: add Voice Design node + language and guidance_scale to Generate
OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent
that compose into an instruct string — wire to Generate's instruct input.

OmniVoiceGenerate: new optional language dropdown (auto + 11 languages)
and guidance_scale (CFG, default 2.0) parameters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:02:06 +02:00
Ethanfel 8805665a22 Add seed parameter to OmniVoice Generate for consistent voice across chunks
OmniVoice chunks long text internally; each chunk is a separate diffusion
pass with different random noise, causing voice drift between paragraphs.
Setting the same seed before each generate() call anchors the RNG state
and keeps the voice consistent. seed=0 means random (default behaviour).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 18:53:58 +02:00
Ethanfel 8d77dd6cd5 Remove torchcodec workaround; recommend Whisper node for ref_text
Users should connect a ComfyUI Whisper node to ref_text instead of
relying on omnivoice's internal ASR. Removes the error-catch workaround
and updates the tooltip accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:49:25 +02:00
Ethanfel 30f46fc3ef Revert transformers cap; catch torchcodec ASR failure with clear message
install.py: restore transformers>=5.0.0 (capping it would break other nodes).
generator.py: catch the torchcodec RuntimeError that fires when ref_text is
blank and transformers 5.x auto-transcription requires missing FFmpeg libs.
Raises a human-readable error telling the user to fill in ref_text manually.
Also updates the ref_text tooltip to recommend providing it explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:35:54 +02:00
Ethanfel 5dfaa0b300 Replace torchaudio.save with soundfile.write; add EPUB loader node
- nodes/generator.py: swap torchaudio.save for soundfile.write to avoid
  torchcodec/FFmpeg dependency crash in environments without FFmpeg shared libs
- nodes/epub_loader.py: new OmniVoiceEpubLoader node for loading EPUB chapters
- tests/test_epub_loader.py: 8 tests for the EPUB loader
- install.py: add beautifulsoup4 to runtime deps
- __init__.py, nodes/__init__.py: register OmniVoiceEpubLoader

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:24:18 +02:00
Ethanfel 5366f6992e feat: add tooltips with inline tag reference to generator node inputs 2026-04-05 15:35:21 +02:00
Ethanfel b273f8f2d7 refactor: remove redundant condition and rename shadowed waveform variable 2026-04-05 10:41:08 +02:00
Ethanfel 0ffd624471 fix: protect os.unlink in finally block from masking original exceptions 2026-04-05 10:38:38 +02:00
Ethanfel a2c542a2bc fix: move output waveform to CPU and cast sample_rate to int 2026-04-05 10:34:53 +02:00
Ethanfel 18fe6359cf fix: add input validation and cpu() guard in OmniVoiceGenerate
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:09:52 +02:00
Ethanfel 95712e5504 feat: add OmniVoiceGenerate node with voice cloning, design, and auto modes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:07:20 +02:00
Ethanfel 0ed43a83ca feat: scaffold ComfyUI-Omnivoice node package 2026-04-05 08:43:17 +02:00