ComfyUI-Omnivoice

Author	SHA1	Message	Date
Ethanfel	b4c1cb2955	chore: bump version to 1.0.6 Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 22:17:21 +02:00
Ethanfel	2e3c357e5a	fix: raise speed minimum from 0.1 to 0.3 to prevent VRAM explosion Speed values below 0.3 produce noise, and 0.1 generates 10x the normal audio length which can consume 20+ GB VRAM and freeze the system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 22:12:04 +02:00
Ethanfel	aa986fd534	fix: set transformers>=5.3.0 as required by omnivoice Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 12:14:13 +02:00
Ethanfel	3fee610050	fix: bump transformers minimum to >=4.57.0 Confirmed working version. Previous >=4.40.0 was too permissive and older versions may lack APIs that omnivoice depends on. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 12:13:19 +02:00
Ethanfel	d4638aa785	chore: bump version to 1.0.5 Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 12:08:44 +02:00
Ethanfel	e9c947b613	fix: strip title and heading tags from EPUB text output Publish to ComfyUI Registry / publish (push) Has been cancelled Details The chapter title was appearing multiple times in the text (from <title>, <h1>, and body). Now <title> and <h1>/<h2>/<h3> are removed from the body text since the title is already available via the chapter_title output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 00:07:00 +02:00
Ethanfel	197bcc554e	chore: bump version to 1.0.4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 23:32:03 +02:00
Ethanfel	9f2683cd54	fix: add torchcodec to requirements for torchaudio backend Newer torchaudio versions default to the TorchCodec backend for loading audio files. Without it installed, omnivoice fails with ImportError when loading reference audio. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 23:31:40 +02:00
Ethanfel	f8657aca80	fix: update tests for chapter_title output and guidance_scale param - epub loader tests: unpack 3 return values (added chapter_title) - generator tests: expect guidance_scale in model.generate() calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 19:06:38 +02:00
Ethanfel	0f2bcc4c0e	chore: bump version to 1.0.3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:41:02 +02:00
Ethanfel	4eabac4c7e	feat: add chapter_title output to EPUB Loader for file naming Returns the title of the first selected chapter as a STRING so it can be wired directly into a Save Audio node's filename field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 17:40:38 +02:00
Ethanfel	dd6d5061e4	chore: bump version to 1.0.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:26:26 +02:00
Ethanfel	147030e2af	fix: surface real import error when omnivoice fails to load The broad except ImportError was swallowing the actual failure reason (e.g. a missing transitive dep after --no-deps install). Now captures and re-raises the original exception in the error message so users can diagnose what is actually missing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:23:59 +02:00
Ethanfel	aedbe2e7d9	fix: declare speaker_1..8 in INPUT_TYPES so ComfyUI validation accepts them Dynamic JS inputs that are not listed in INPUT_TYPES may be rejected by ComfyUI's prompt validator and not passed to the Python function. Declaring all 8 slots as optional fixes this while JS still controls which slots are visible on the node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:41:56 +02:00
Ethanfel	26295e4db7	feat: auto-discover user presets from the presets folder Drop any audio file (wav/flac/mp3/ogg/m4a) into the presets cache dir and it will appear as "<name> (local)" in the Voice Preset dropdown on next ComfyUI restart. Add a same-stem .txt file for the transcript. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:37:17 +02:00
Ethanfel	d5f2632c48	feat: support [tag]: syntax in tagged_speakers mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:21:27 +02:00
Ethanfel	33b3d62d02	fix: tagged_speakers splits on single newlines, not just double newlines Each line starting with [Tag] now begins a new segment so users don't need blank lines between tagged speeches. Continuation lines (no tag) are joined to the previous tagged segment for multi-line speeches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:16:54 +02:00
Ethanfel	95cf706b19	feat: add multi-speaker generation with JS-powered dynamic slots - Add OmniVoiceSpeaker node (label + ref_audio + ref_text → OMNIVOICE_SPEAKER) - Add OmniVoiceSpeakers node (roster with dynamic speaker_N inputs driven by num_speakers INT widget; slots expand/collapse via ComfyUI JS extension) - Add web/multi_speaker.js: ComfyUI extension that hooks onNodeCreated and onConfigure to sync speaker_N inputs in real time (max 8 speakers) - Extend OmniVoiceGenerate with optional speakers (OMNIVOICE_SPEAKERS) input; when connected it routes each paragraph to the assigned speaker and concatenates the results — supports alternate_paragraphs and tagged_speakers modes - Remove OmniVoiceMultiSpeakerGenerate (generation now lives in the existing Generate node) - Refactor generator.py: extract _write_tmp_wav helper, add _tensors_to_audio Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:08:23 +02:00
Ethanfel	3cbc04d12d	wf: update default workflow from ComfyUI export Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:50:52 +02:00
Ethanfel	340c0aa402	simplify: remove language param entirely — model detects from instruct string Chinese characters vs English words are self-identifying to the model. No need for a separate language signal on either node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:44:27 +02:00
Ethanfel	2b13e55dc5	fix: pipe language out of Voice Design into Generate Voice Design now outputs (instruct, language) — wire language directly into Generate to avoid setting it in two places. Generate's language input is now a STRING (accepts the connection or manual 'auto'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:40:35 +02:00
Ethanfel	86ec8cf3fb	bump: version 1.0.1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:33:55 +02:00
Ethanfel	ae2255d9e4	fix: add missing omnivoice runtime deps for fresh installs pydub, tensorboardx, webdataset are omnivoice dependencies that won't be present on a clean ComfyUI install since we use --no-deps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:33:19 +02:00
Ethanfel	d5a0ebeb9a	revert: restore working requirements.txt Pinning transformers==5.3.0 risks conflicting with existing ComfyUI venv. Back to permissive >=4.40.0 which worked in practice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:29:04 +02:00
Ethanfel	0d43e5374f	fix: sync requirements.txt with omnivoice's actual --no-deps dependencies Pins transformers==5.3.0 (omnivoice requires exact version), restores pydub (omnivoice dep), adds tensorboardx and webdataset. Drops gradio (demo-only, not needed for inference). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:28:35 +02:00
Ethanfel	2b4b221e88	chore: remove unused pydub from requirements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:27:34 +02:00
Ethanfel	772f6654d4	feat: add language selector for voice_design + Chinese instruct support - Generate: language dropdown (auto/English/Chinese), passed only in voice_design and auto_voice modes where it selects the instruct vocab - VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns using the model's validated Chinese instruct vocabulary (全角逗号) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:22:25 +02:00
Ethanfel	e26bac3684	remove: language parameter from Generate (model auto-detects from text) Language is inferred from the text content — the parameter had no effect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:21:27 +02:00
Ethanfel	194e0b0e09	fix: trim accent list to model-validated values only The model's _resolve_instruct() validates against a fixed vocabulary. Only 10 accents are supported — removed all unsupported additions. Updated tooltip to reflect actual constraints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:17:59 +02:00
Ethanfel	d4bf7c825e	feat: pass instruct in voice_cloning mode for accent/style influence If instruct is set alongside ref_audio, it is now forwarded to model.generate() — allowing accent/style transfer on top of the cloned voice identity. Model may or may not honour both simultaneously. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:07:18 +02:00
Ethanfel	d2cb5c4249	feat: expand language and accent lists to full coverage Language: ~170 world languages with type-to-filter dropdown Accent: 50+ regional varieties grouped by area Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:04:12 +02:00
Ethanfel	c1558efad9	feat: add Voice Design node + language and guidance_scale to Generate OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent that compose into an instruct string — wire to Generate's instruct input. OmniVoiceGenerate: new optional language dropdown (auto + 11 languages) and guidance_scale (CFG, default 2.0) parameters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:02:06 +02:00
Ethanfel	97ed0f209f	fix: rename node id to comfyui-omnivoice-fel (comfyui-omnivoice is taken) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:59:17 +02:00
Ethanfel	f7d624799c	fix: revert PublisherId to lowercase ethanfel (matches registry) Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:46:55 +02:00
Ethanfel	d5000dee11	fix: correct PublisherId casing to Ethanfel Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:40:52 +02:00
Ethanfel	bb1d83578c	ci: trigger publish on pyproject.toml push to master (not tags) Matches the pattern used by the working LoRA Optimizer publish workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:37:34 +02:00
Ethanfel	138df3abc4	fix: align pyproject.toml format with working registry publish Publish to ComfyUI Registry / publish (push) Has been cancelled Details - license: use { text = "GPL-3.0-only" } (matches comfy-cli expectation) - add dependencies = [] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> v1.0.0	2026-04-05 19:35:31 +02:00
Ethanfel	db884341e8	fix: move repository URL to [project.urls] for comfy-cli Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:34:30 +02:00
Ethanfel	dfad38a5c3	fix: correct pyproject.toml for registry publish Publish to ComfyUI Registry / publish (push) Has been cancelled Details - license: use SPDX identifier GPL-3.0-only instead of file reference - add Repository URL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:32:52 +02:00
Ethanfel	f4697ae88f	wf: update default workflow from ComfyUI export Publish to ComfyUI Registry / publish (push) Has been cancelled Details Refreshed node IDs, positions and sizes from live session. Replaced SaveAudio with PreviewAudio, added ref_text widget entry, updated aux_id/ver properties. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:27:45 +02:00
Ethanfel	616c2a3e61	license: add GNU GPL v3.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:27:22 +02:00
Ethanfel	9b62c9bda8	fix: add seed control_after_generate value to default workflow ComfyUI appends a hidden "fixed"/"randomize" value after every INT named "seed". Without it the widget values were misaligned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:21:13 +02:00
Ethanfel	46591553f9	feat: add torch.compile option to Model Loader Compiles the model graph on first generation (~30-60s warmup) then speeds up all subsequent generations in the session. Recommended for audiobook pipelines. Default off. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:16:39 +02:00
Ethanfel	2e1cba61fc	fix: adjust node sizes in default workflow for seed widget Generate node height 340→400 to fit all 6 widgets, Voice Preset height 80→100, SaveAudio position adjusted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:12:31 +02:00
Ethanfel	c22cc7c296	rename: voice_cloning.json → omnivoice_voice_cloning.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:10:51 +02:00
Ethanfel	f7756e6240	ci: add pyproject.toml for ComfyUI registry Sets publisher ID, display name, and package metadata required for Comfy-Org/publish-node-action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:10:20 +02:00
Ethanfel	8e2367f8f0	ci: add ComfyUI registry publish workflow Triggers on version tags (v*) using Comfy-Org/publish-node-action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:09:40 +02:00
Ethanfel	14542a6b00	docs: update README with all nodes (presets, mix voices, EPUB loader) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:07:15 +02:00
Ethanfel	76118f57c3	fix: only catch ImportError in _resample torchaudio fallback Catching bare Exception was silently swallowing real resampling errors. Only ImportError should trigger the interpolate fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:05:22 +02:00
Ethanfel	219c74d7ed	fix: three bugs in OmniVoiceMixVoices - _resample: squeeze batch dim before torchaudio.Resample (expected 2D) - weight scaling: each clip now trims to natural_length*weight samples, dropping the broken target_per_unit double-multiplication - empty trimmed guard: raise clear error when all weights are 0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:04:54 +02:00

1 2

81 Commits