ComfyUI-Omnivoice

Author	SHA1	Message	Date
Ethanfel	d5f2632c48	feat: support [tag]: syntax in tagged_speakers mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:21:27 +02:00
Ethanfel	33b3d62d02	fix: tagged_speakers splits on single newlines, not just double newlines Each line starting with [Tag] now begins a new segment so users don't need blank lines between tagged speeches. Continuation lines (no tag) are joined to the previous tagged segment for multi-line speeches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:16:54 +02:00
Ethanfel	95cf706b19	feat: add multi-speaker generation with JS-powered dynamic slots - Add OmniVoiceSpeaker node (label + ref_audio + ref_text → OMNIVOICE_SPEAKER) - Add OmniVoiceSpeakers node (roster with dynamic speaker_N inputs driven by num_speakers INT widget; slots expand/collapse via ComfyUI JS extension) - Add web/multi_speaker.js: ComfyUI extension that hooks onNodeCreated and onConfigure to sync speaker_N inputs in real time (max 8 speakers) - Extend OmniVoiceGenerate with optional speakers (OMNIVOICE_SPEAKERS) input; when connected it routes each paragraph to the assigned speaker and concatenates the results — supports alternate_paragraphs and tagged_speakers modes - Remove OmniVoiceMultiSpeakerGenerate (generation now lives in the existing Generate node) - Refactor generator.py: extract _write_tmp_wav helper, add _tensors_to_audio Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 09:08:23 +02:00
Ethanfel	3cbc04d12d	wf: update default workflow from ComfyUI export Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:50:52 +02:00
Ethanfel	340c0aa402	simplify: remove language param entirely — model detects from instruct string Chinese characters vs English words are self-identifying to the model. No need for a separate language signal on either node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:44:27 +02:00
Ethanfel	2b13e55dc5	fix: pipe language out of Voice Design into Generate Voice Design now outputs (instruct, language) — wire language directly into Generate to avoid setting it in two places. Generate's language input is now a STRING (accepts the connection or manual 'auto'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:40:35 +02:00
Ethanfel	86ec8cf3fb	bump: version 1.0.1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:33:55 +02:00
Ethanfel	ae2255d9e4	fix: add missing omnivoice runtime deps for fresh installs pydub, tensorboardx, webdataset are omnivoice dependencies that won't be present on a clean ComfyUI install since we use --no-deps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:33:19 +02:00
Ethanfel	d5a0ebeb9a	revert: restore working requirements.txt Pinning transformers==5.3.0 risks conflicting with existing ComfyUI venv. Back to permissive >=4.40.0 which worked in practice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:29:04 +02:00
Ethanfel	0d43e5374f	fix: sync requirements.txt with omnivoice's actual --no-deps dependencies Pins transformers==5.3.0 (omnivoice requires exact version), restores pydub (omnivoice dep), adds tensorboardx and webdataset. Drops gradio (demo-only, not needed for inference). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:28:35 +02:00
Ethanfel	2b4b221e88	chore: remove unused pydub from requirements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:27:34 +02:00
Ethanfel	772f6654d4	feat: add language selector for voice_design + Chinese instruct support - Generate: language dropdown (auto/English/Chinese), passed only in voice_design and auto_voice modes where it selects the instruct vocab - VoiceDesign: Chinese mode with dialect/age/pitch/gender dropdowns using the model's validated Chinese instruct vocabulary (全角逗号) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:22:25 +02:00
Ethanfel	e26bac3684	remove: language parameter from Generate (model auto-detects from text) Language is inferred from the text content — the parameter had no effect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:21:27 +02:00
Ethanfel	194e0b0e09	fix: trim accent list to model-validated values only The model's _resolve_instruct() validates against a fixed vocabulary. Only 10 accents are supported — removed all unsupported additions. Updated tooltip to reflect actual constraints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:17:59 +02:00
Ethanfel	d4bf7c825e	feat: pass instruct in voice_cloning mode for accent/style influence If instruct is set alongside ref_audio, it is now forwarded to model.generate() — allowing accent/style transfer on top of the cloned voice identity. Model may or may not honour both simultaneously. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:07:18 +02:00
Ethanfel	d2cb5c4249	feat: expand language and accent lists to full coverage Language: ~170 world languages with type-to-filter dropdown Accent: 50+ regional varieties grouped by area Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:04:12 +02:00
Ethanfel	c1558efad9	feat: add Voice Design node + language and guidance_scale to Generate OmniVoiceVoiceDesign: structured dropdowns for gender/age/pitch/accent that compose into an instruct string — wire to Generate's instruct input. OmniVoiceGenerate: new optional language dropdown (auto + 11 languages) and guidance_scale (CFG, default 2.0) parameters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 20:02:06 +02:00
Ethanfel	97ed0f209f	fix: rename node id to comfyui-omnivoice-fel (comfyui-omnivoice is taken) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:59:17 +02:00
Ethanfel	f7d624799c	fix: revert PublisherId to lowercase ethanfel (matches registry) Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:46:55 +02:00
Ethanfel	d5000dee11	fix: correct PublisherId casing to Ethanfel Publish to ComfyUI Registry / publish (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:40:52 +02:00
Ethanfel	bb1d83578c	ci: trigger publish on pyproject.toml push to master (not tags) Matches the pattern used by the working LoRA Optimizer publish workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:37:34 +02:00
Ethanfel	138df3abc4	fix: align pyproject.toml format with working registry publish Publish to ComfyUI Registry / publish (push) Has been cancelled Details - license: use { text = "GPL-3.0-only" } (matches comfy-cli expectation) - add dependencies = [] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> v1.0.0	2026-04-05 19:35:31 +02:00
Ethanfel	db884341e8	fix: move repository URL to [project.urls] for comfy-cli Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:34:30 +02:00
Ethanfel	dfad38a5c3	fix: correct pyproject.toml for registry publish Publish to ComfyUI Registry / publish (push) Has been cancelled Details - license: use SPDX identifier GPL-3.0-only instead of file reference - add Repository URL Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:32:52 +02:00
Ethanfel	f4697ae88f	wf: update default workflow from ComfyUI export Publish to ComfyUI Registry / publish (push) Has been cancelled Details Refreshed node IDs, positions and sizes from live session. Replaced SaveAudio with PreviewAudio, added ref_text widget entry, updated aux_id/ver properties. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:27:45 +02:00
Ethanfel	616c2a3e61	license: add GNU GPL v3.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:27:22 +02:00
Ethanfel	9b62c9bda8	fix: add seed control_after_generate value to default workflow ComfyUI appends a hidden "fixed"/"randomize" value after every INT named "seed". Without it the widget values were misaligned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:21:13 +02:00
Ethanfel	46591553f9	feat: add torch.compile option to Model Loader Compiles the model graph on first generation (~30-60s warmup) then speeds up all subsequent generations in the session. Recommended for audiobook pipelines. Default off. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:16:39 +02:00
Ethanfel	2e1cba61fc	fix: adjust node sizes in default workflow for seed widget Generate node height 340→400 to fit all 6 widgets, Voice Preset height 80→100, SaveAudio position adjusted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:12:31 +02:00
Ethanfel	c22cc7c296	rename: voice_cloning.json → omnivoice_voice_cloning.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:10:51 +02:00
Ethanfel	f7756e6240	ci: add pyproject.toml for ComfyUI registry Sets publisher ID, display name, and package metadata required for Comfy-Org/publish-node-action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:10:20 +02:00
Ethanfel	8e2367f8f0	ci: add ComfyUI registry publish workflow Triggers on version tags (v*) using Comfy-Org/publish-node-action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:09:40 +02:00
Ethanfel	14542a6b00	docs: update README with all nodes (presets, mix voices, EPUB loader) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:07:15 +02:00
Ethanfel	76118f57c3	fix: only catch ImportError in _resample torchaudio fallback Catching bare Exception was silently swallowing real resampling errors. Only ImportError should trigger the interpolate fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:05:22 +02:00
Ethanfel	219c74d7ed	fix: three bugs in OmniVoiceMixVoices - _resample: squeeze batch dim before torchaudio.Resample (expected 2D) - weight scaling: each clip now trims to natural_length*weight samples, dropping the broken target_per_unit double-multiplication - empty trimmed guard: raise clear error when all weights are 0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:04:54 +02:00
Ethanfel	c7c7123068	feat: add OmniVoice Mix Voices node for blended speaker cloning Concatenates 2-3 reference audio clips (with per-voice duration weights) to create a blended speaker embedding. Merges transcripts for ref_text. Handles mismatched sample rates and mono conversion automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 19:03:43 +02:00
Ethanfel	f8a3bebe9c	feat: add seed=42 to default workflow for voice consistency Sets a default seed so the voice stays consistent across all generated chunks when using the workflow as a starting point for audiobook pipelines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:58:22 +02:00
Ethanfel	8805665a22	Add seed parameter to OmniVoice Generate for consistent voice across chunks OmniVoice chunks long text internally; each chunk is a separate diffusion pass with different random noise, causing voice drift between paragraphs. Setting the same seed before each generate() call anchors the RNG state and keeps the voice consistent. seed=0 means random (default behaviour). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:53:58 +02:00
Ethanfel	4c42322c6f	Expand voice presets to 8 voices (3 female, 5 male) All transcribed via whisper-medium. Sources: Chatterbox demo GCS bucket (ResembleAI) and F5-TTS repo (SWivid). Female: Shadowheart, American actress, Podcast host Male: Nature, Old Hollywood, Rick Sanchez, Stewie Griffin, Harvey Keitel, Conan O'Brien Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:51:43 +02:00
Ethanfel	c109e860a8	Add transcript for Shadowheart preset (transcribed via whisper-medium) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:39:14 +02:00
Ethanfel	75e74075f5	Restore Shadowheart preset; user will transcribe via Whisper node Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:33:52 +02:00
Ethanfel	8de201a4c9	Add OmniVoice Voice Preset node with two female voice samples Two built-in presets, auto-downloaded and cached to ComfyUI/models/omnivoice/presets/: - "Nature – female, warm" (F5-TTS basic_ref_en.wav, transcript included) - "Shadowheart – female, expressive" (Chatterbox demo, connect Whisper for transcript) Outputs ref_audio (AUDIO) and ref_text (STRING) — wire directly into OmniVoice Generate. Updated default workflow to use this node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:19:29 +02:00
Ethanfel	d779526225	Preserve paragraph breaks in EPUB text extraction get_text(separator=' ') collapsed all paragraphs into one line. Now inserts \n\n at block-level element boundaries (p, h1-h6, div, li, br, tr) before extraction, then normalises whitespace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:06:41 +02:00
Ethanfel	b52edcfd84	Remove local path option from model loader Models always download to ComfyUI/models/omnivoice/ via HuggingFace. Local path added unnecessary complexity; users who want a custom path can symlink into the models directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:02:55 +02:00
Ethanfel	cd0f7aff07	Add default voice cloning workflow Model Loader → Load Audio → OmniVoice Generate → Save Audio. Connect a Whisper node to ref_text for auto-transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 18:01:18 +02:00
Ethanfel	8d77dd6cd5	Remove torchcodec workaround; recommend Whisper node for ref_text Users should connect a ComfyUI Whisper node to ref_text instead of relying on omnivoice's internal ASR. Removes the error-catch workaround and updates the tooltip accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:49:25 +02:00
Ethanfel	a3fb88e559	Restore install.py for omnivoice --no-deps only requirements.txt cannot install omnivoice (it would pull in torch==2.8.* and break ComfyUI). install.py now does exactly one thing: install omnivoice --no-deps, skipped if already present. All other deps remain in requirements.txt for ComfyUI Manager to handle normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:45:24 +02:00
Ethanfel	dbb3207df1	Replace install.py with standard requirements.txt install.py was running arbitrary pip installs as part of node loading, which is dangerous in a shared venv. Standard approach: requirements.txt lists the safe deps (transformers, accelerate, soundfile, etc.); omnivoice itself must be installed once manually with --no-deps to avoid overwriting ComfyUI's torch. README documents this clearly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:44:52 +02:00
Ethanfel	e8e8943692	Remove transformers upper bound cap from install.py The cap was wrong — it would downgrade transformers in shared venvs and break other nodes. The torchcodec issue is handled in code now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:36:06 +02:00
Ethanfel	30f46fc3ef	Revert transformers cap; catch torchcodec ASR failure with clear message install.py: restore transformers>=5.0.0 (capping it would break other nodes). generator.py: catch the torchcodec RuntimeError that fires when ref_text is blank and transformers 5.x auto-transcription requires missing FFmpeg libs. Raises a human-readable error telling the user to fill in ref_text manually. Also updates the ref_text tooltip to recommend providing it explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 17:35:54 +02:00

1 2

66 Commits