ComfyUI-Omnivoice

Author	SHA1	Message	Date
Ethanfel	2b13e55dc5	fix: pipe language out of Voice Design into Generate Voice Design now outputs (instruct, language) — wire language directly into Generate to avoid setting it in two places. Generate's language input is now a STRING (accepts the connection or manual 'auto'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:40:35 +02:00
Ethanfel	86ec8cf3fb	bump: version 1.0.1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:33:55 +02:00
Ethanfel	ae2255d9e4	fix: add missing omnivoice runtime deps for fresh installs pydub, tensorboardx, webdataset are omnivoice dependencies that won't be present on a clean ComfyUI install since we use --no-deps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:33:19 +02:00
Ethanfel	d5a0ebeb9a	revert: restore working requirements.txt Pinning transformers==5.3.0 risks conflicting with existing ComfyUI venv. Back to permissive >=4.40.0 which worked in practice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:29:04 +02:00
Ethanfel	0d43e5374f	fix: sync requirements.txt with omnivoice's actual --no-deps dependencies Pins transformers==5.3.0 (omnivoice requires exact version), restores pydub (omnivoice dep), adds tensorboardx and webdataset. Drops gradio (demo-only, not needed for inference). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:28:35 +02:00
Ethanfel	2b4b221e88	chore: remove unused pydub from requirements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:27:34 +02:00
Ethanfel	772f6654d4	feat: add language selector for voice_design + Chinese instruct support - Generate: language dropdown (auto/English/Chinese), passed only in voice_design and auto_voice modes where it selects the instruct vocab - VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns using the model's validated Chinese instruct vocabulary (全角逗号) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:22:25 +02:00
Ethanfel	e26bac3684	remove: language parameter from Generate (model auto-detects from text) Language is inferred from the text content — the parameter had no effect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:21:27 +02:00
Ethanfel	194e0b0e09	fix: trim accent list to model-validated values only The model's _resolve_instruct() validates against a fixed vocabulary. Only 10 accents are supported — removed all unsupported additions. Updated tooltip to reflect actual constraints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:17:59 +02:00
Ethanfel	d4bf7c825e	feat: pass instruct in voice_cloning mode for accent/style influence If instruct is set alongside ref_audio, it is now forwarded to model.generate() — allowing accent/style transfer on top of the cloned voice identity. Model may or may not honour both simultaneously. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:07:18 +02:00
Ethanfel	d2cb5c4249	feat: expand language and accent lists to full coverage Language: ~170 world languages with type-to-filter dropdown Accent: 50+ regional varieties grouped by area Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:04:12 +02:00
Ethanfel	c1558efad9	feat: add Voice Design node + language and guidance_scale to Generate OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent that compose into an instruct string — wire to Generate's instruct input. OmniVoiceGenerate: new optional language dropdown (auto + 11 languages) and guidance_scale (CFG, default 2.0) parameters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:02:06 +02:00
Ethanfel	97ed0f209f	fix: rename node id to comfyui-omnivoice-fel (comfyui-omnivoice is taken) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:59:17 +02:00
Ethanfel	f7d624799c	fix: revert PublisherId to lowercase ethanfel (matches registry) Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:46:55 +02:00
Ethanfel	d5000dee11	fix: correct PublisherId casing to Ethanfel Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:40:52 +02:00
Ethanfel	bb1d83578c	ci: trigger publish on pyproject.toml push to master (not tags) Matches the pattern used by the working LoRA Optimizer publish workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:37:34 +02:00
Ethanfel	138df3abc4	fix: align pyproject.toml format with working registry publish Publish to ComfyUI Registry / publish (push) Has been cancelled Details - license: use { text = "GPL-3.0-only" } (matches comfy-cli expectation) - add dependencies = [] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> v1.0.0	2026-04-05 19:35:31 +02:00
Ethanfel	db884341e8	fix: move repository URL to [project.urls] for comfy-cli Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:34:30 +02:00
Ethanfel	dfad38a5c3	fix: correct pyproject.toml for registry publish Publish to ComfyUI Registry / publish (push) Has been cancelled Details - license: use SPDX identifier GPL-3.0-only instead of file reference - add Repository URL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:32:52 +02:00
Ethanfel	f4697ae88f	wf: update default workflow from ComfyUI export Publish to ComfyUI Registry / publish (push) Has been cancelled Details Refreshed node IDs, positions and sizes from live session. Replaced SaveAudio with PreviewAudio, added ref_text widget entry, updated aux_id/ver properties. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:27:45 +02:00
Ethanfel	616c2a3e61	license: add GNU GPL v3.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:27:22 +02:00
Ethanfel	9b62c9bda8	fix: add seed control_after_generate value to default workflow ComfyUI appends a hidden "fixed"/"randomize" value after every INT named "seed". Without it the widget values were misaligned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:21:13 +02:00
Ethanfel	46591553f9	feat: add torch.compile option to Model Loader Compiles the model graph on first generation (~30-60s warmup) then speeds up all subsequent generations in the session. Recommended for audiobook pipelines. Default off. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:16:39 +02:00
Ethanfel	2e1cba61fc	fix: adjust node sizes in default workflow for seed widget Generate node height 340→400 to fit all 6 widgets, Voice Preset height 80→100, SaveAudio position adjusted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:12:31 +02:00
Ethanfel	c22cc7c296	rename: voice_cloning.json → omnivoice_voice_cloning.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:10:51 +02:00
Ethanfel	f7756e6240	ci: add pyproject.toml for ComfyUI registry Sets publisher ID, display name, and package metadata required for Comfy-Org/publish-node-action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:10:20 +02:00
Ethanfel	8e2367f8f0	ci: add ComfyUI registry publish workflow Triggers on version tags (v*) using Comfy-Org/publish-node-action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:09:40 +02:00
Ethanfel	14542a6b00	docs: update README with all nodes (presets, mix voices, EPUB loader) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:07:15 +02:00
Ethanfel	76118f57c3	fix: only catch ImportError in _resample torchaudio fallback Catching bare Exception was silently swallowing real resampling errors. Only ImportError should trigger the interpolate fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:05:22 +02:00
Ethanfel	219c74d7ed	fix: three bugs in OmniVoiceMixVoices - _resample: squeeze batch dim before torchaudio.Resample (expected 2D) - weight scaling: each clip now trims to natural_length*weight samples, dropping the broken target_per_unit double-multiplication - empty trimmed guard: raise clear error when all weights are 0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:04:54 +02:00
Ethanfel	c7c7123068	feat: add OmniVoice Mix Voices node for blended speaker cloning Concatenates 2-3 reference audio clips (with per-voice duration weights) to create a blended speaker embedding. Merges transcripts for ref_text. Handles mismatched sample rates and mono conversion automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:03:43 +02:00
Ethanfel	f8a3bebe9c	feat: add seed=42 to default workflow for voice consistency Sets a default seed so the voice stays consistent across all generated chunks when using the workflow as a starting point for audiobook pipelines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:58:22 +02:00
Ethanfel	8805665a22	Add seed parameter to OmniVoice Generate for consistent voice across chunks OmniVoice chunks long text internally; each chunk is a separate diffusion pass with different random noise, causing voice drift between paragraphs. Setting the same seed before each generate() call anchors the RNG state and keeps the voice consistent. seed=0 means random (default behaviour). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:53:58 +02:00
Ethanfel	4c42322c6f	Expand voice presets to 8 voices (3 female, 5 male) All transcribed via whisper-medium. Sources: Chatterbox demo GCS bucket (ResembleAI) and F5-TTS repo (SWivid). Female: Shadowheart, American actress, Podcast host Male: Nature, Old Hollywood, Rick Sanchez, Stewie Griffin, Harvey Keitel, Conan O'Brien Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:51:43 +02:00
Ethanfel	c109e860a8	Add transcript for Shadowheart preset (transcribed via whisper-medium) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:39:14 +02:00
Ethanfel	75e74075f5	Restore Shadowheart preset; user will transcribe via Whisper node Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:33:52 +02:00
Ethanfel	8de201a4c9	Add OmniVoice Voice Preset node with two female voice samples Two built-in presets, auto-downloaded and cached to ComfyUI/models/omnivoice/presets/: - "Nature – female, warm" (F5-TTS basic_ref_en.wav, transcript included) - "Shadowheart – female, expressive" (Chatterbox demo, connect Whisper for transcript) Outputs ref_audio (AUDIO) and ref_text (STRING) — wire directly into OmniVoice Generate. Updated default workflow to use this node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:19:29 +02:00
Ethanfel	d779526225	Preserve paragraph breaks in EPUB text extraction get_text(separator=' ') collapsed all paragraphs into one line. Now inserts \n\n at block-level element boundaries (p, h1-h6, div, li, br, tr) before extraction, then normalises whitespace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:06:41 +02:00
Ethanfel	b52edcfd84	Remove local path option from model loader Models always download to ComfyUI/models/omnivoice/ via HuggingFace. Local path added unnecessary complexity; users who want a custom path can symlink into the models directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:02:55 +02:00
Ethanfel	cd0f7aff07	Add default voice cloning workflow Model Loader → Load Audio → OmniVoice Generate → Save Audio. Connect a Whisper node to ref_text for auto-transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:01:18 +02:00
Ethanfel	8d77dd6cd5	Remove torchcodec workaround; recommend Whisper node for ref_text Users should connect a ComfyUI Whisper node to ref_text instead of relying on omnivoice's internal ASR. Removes the error-catch workaround and updates the tooltip accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:49:25 +02:00
Ethanfel	a3fb88e559	Restore install.py for omnivoice --no-deps only requirements.txt cannot install omnivoice (it would pull in torch==2.8.* and break ComfyUI). install.py now does exactly one thing: install omnivoice --no-deps, skipped if already present. All other deps remain in requirements.txt for ComfyUI Manager to handle normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:45:24 +02:00
Ethanfel	dbb3207df1	Replace install.py with standard requirements.txt install.py was running arbitrary pip installs as part of node loading, which is dangerous in a shared venv. Standard approach: requirements.txt lists the safe deps (transformers, accelerate, soundfile, etc.); omnivoice itself must be installed once manually with --no-deps to avoid overwriting ComfyUI's torch. README documents this clearly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:44:52 +02:00
Ethanfel	e8e8943692	Remove transformers upper bound cap from install.py The cap was wrong — it would downgrade transformers in shared venvs and break other nodes. The torchcodec issue is handled in code now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:36:06 +02:00
Ethanfel	30f46fc3ef	Revert transformers cap; catch torchcodec ASR failure with clear message install.py: restore transformers>=5.0.0 (capping it would break other nodes). generator.py: catch the torchcodec RuntimeError that fires when ref_text is blank and transformers 5.x auto-transcription requires missing FFmpeg libs. Raises a human-readable error telling the user to fill in ref_text manually. Also updates the ref_text tooltip to recommend providing it explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:35:54 +02:00
Ethanfel	d6ff42dc7c	Cap transformers below 5.0 to avoid torchcodec ASR crash transformers 5.x unconditionally imports torchcodec in its ASR pipeline preprocess step, which crashes in environments without FFmpeg shared libs. 4.x does not have this dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:32:50 +02:00
Ethanfel	5dfaa0b300	Replace torchaudio.save with soundfile.write; add EPUB loader node - nodes/generator.py: swap torchaudio.save for soundfile.write to avoid torchcodec/FFmpeg dependency crash in environments without FFmpeg shared libs - nodes/epub_loader.py: new OmniVoiceEpubLoader node for loading EPUB chapters - tests/test_epub_loader.py: 8 tests for the EPUB loader - install.py: add beautifulsoup4 to runtime deps - __init__.py, nodes/__init__.py: register OmniVoiceEpubLoader Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:24:18 +02:00
Ethanfel	5366f6992e	feat: add tooltips with inline tag reference to generator node inputs	2026-04-05 15:35:21 +02:00
Ethanfel	f647a24988	fix: add install.py to prevent omnivoice from overwriting ComfyUI's torch	2026-04-05 14:53:33 +02:00
Ethanfel	b273f8f2d7	refactor: remove redundant condition and rename shadowed waveform variable	2026-04-05 10:41:08 +02:00

1 2

61 Commits