Files
Comfyui-VACE-Tools/README.md
Ethanfel a164fbeb87 Normalize input/output names across nodes
- SourcePrep output: source_clip → trimmed_clip (distinguish from input)
- MergeBack input: original_clip → source_clip (matches SourcePrep's
  input — same video wire from Load Video)
- MergeBack input: original_clip_2 → source_clip_2 (matches SourcePrep)
- Update README wiring diagram and tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 10:34:07 +01:00

391 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ComfyUI-VACE-Tools
ComfyUI custom nodes for WanVideo/VACE workflows — mask/control-frame generation and model saving.
## Installation
```bash
cd ComfyUI/custom_nodes/
git clone https://github.com/ethanfel/Comfyui-VACE-Tools.git
```
Restart ComfyUI. Nodes appear under the **VACE Tools** and **WanVideoWrapper** categories.
## Node: VACE Source Prep
Trims long source clips so they can be used with VACE Mask Generator. Place this node **before** the mask generator when your source clip has more frames than `target_frames`. It selects the relevant frames based on mode and outputs adjusted parameters to wire directly into the mask generator.
Irrelevant widgets are automatically hidden based on the selected mode.
### Inputs
| Input | Type | Default | Description |
|---|---|---|---|
| `source_clip` | IMAGE | — | Full source video frames (B, H, W, C tensor). |
| `mode` | ENUM | `End Extend` | Generation mode — must match the mask generator's mode. |
| `split_index` | INT | `0` | Split position in the full source video. Same meaning as the mask generator's split_index. |
| `input_left` | INT | `0` | Frames from the left side of the split point to keep (0 = all available). End: trailing context. Middle: frames before split. Edge/Join: start edge size. Bidirectional: trailing context. Replace: context before region. |
| `input_right` | INT | `0` | Frames from the right side of the split point to keep (0 = all available). Pre: leading reference. Middle: frames after split. Edge/Join: end edge size. Replace: context after region. |
| `edge_frames` | INT | `8` | Default edge size for Edge/Join modes (overridden by input_left/input_right if non-zero). Replace/Inpaint: number of frames to replace. |
| `source_clip_2` | IMAGE | *(optional)* | Second clip for Join Extend — join two separate clips instead of splitting one in half. |
| `inpaint_mask` | MASK | *(optional)* | Spatial inpaint mask — trimmed to match output frames for Video Inpaint mode. |
| `keyframe_positions` | STRING | *(optional)* | Keyframe positions pass-through for Keyframe mode. |
### Outputs
| Output | Type | Description |
|---|---|---|
| `trimmed_clip` | IMAGE | Trimmed frames — wire to mask generator's source_clip. |
| `mode` | ENUM | Selected mode — wire to mask generator's mode. |
| `split_index` | INT | Adjusted for the trimmed clip — wire to mask generator. |
| `edge_frames` | INT | Adjusted/passed through — wire to mask generator. |
| `inpaint_mask` | MASK | Trimmed to match output, or placeholder. |
| `keyframe_positions` | STRING | Pass-through. |
| `vace_pipe` | VACE_PIPE | Pipe carrying mode, trim bounds, and context frame counts — wire to VACE Merge Back. |
### Per-Mode Trimming
| Mode | input_left | input_right | Behavior |
|---|---|---|---|
| End Extend | Trailing context frames | — | Keeps last N frames |
| Pre Extend | — | Leading reference frames | Keeps first N frames |
| Middle Extend | Frames before split | Frames after split | Window around split_index |
| Edge Extend | Start edge size | End edge size | Overrides edge_frames; forced symmetric (min of both) |
| Join Extend | Edge from first half/clip | Edge from second half/clip | Edge context around midpoint (or between two clips if source_clip_2 connected); forced symmetric |
| Bidirectional | Trailing context frames | — | Keeps last N frames |
| Frame Interpolation | — | — | Pass-through (no trimming) |
| Replace/Inpaint | Context before region | Context after region | Window around replace region |
| Video Inpaint | — | — | Pass-through (no trimming) |
| Keyframe | — | — | Pass-through (no trimming) |
---
## Node: VACE Mask Generator
Builds mask and control_frames sequences for all VACE generation modes. Works standalone for short clips, or downstream of VACE Source Prep for long clips.
**Note:** For modes that use `target_frames` (End, Pre, Middle, Edge, Join, Bidirectional, Keyframe), `source_clip` must not have more frames than `target_frames`. If your source is longer, use VACE Source Prep upstream to trim it first.
### Inputs
| Input | Type | Default | Description |
|---|---|---|---|
| `source_clip` | IMAGE | — | Source video frames (B, H, W, C tensor). Must not exceed target_frames for modes that use it. |
| `mode` | ENUM | `End Extend` | Generation mode (see below). 10 modes available. |
| `target_frames` | INT | `81` | Total output frame count for mask and control_frames (110000). Used by Keyframe to set output length. Unused by Frame Interpolation, Replace/Inpaint, and Video Inpaint. |
| `split_index` | INT | `0` | Where to split the source. Meaning varies by mode. Unused by Edge/Join/Keyframe. Bidirectional: frames before clip (0 = even split). Frame Interpolation: new frames per gap. Replace/Inpaint: start index of replace region. |
| `edge_frames` | INT | `8` | Number of edge frames for Edge and Join modes. Replace/Inpaint: number of frames to replace. Unused by End/Pre/Middle/Bidirectional/Frame Interpolation/Video Inpaint/Keyframe. |
| `inpaint_mask` | MASK | *(optional)* | Spatial inpaint mask for Video Inpaint mode (B, H, W). White (1.0) = regenerate, Black (0.0) = keep. Single frame broadcasts to all source frames. |
| `keyframe_positions` | STRING | *(optional)* | Comma-separated frame indices for Keyframe mode (e.g. `0,20,50,80`). One position per source frame, sorted ascending, within [0, target_frames-1]. Leave empty for even auto-spread. |
### Outputs
| Output | Description |
|---|---|
| `mask` | Black/white frame sequence (`target_frames` long). Black = keep, White = generate. |
| `control_frames` | Source frames composited with grey (`#7f7f7f`) fill (`target_frames` long). Fed to VACE as visual reference. |
| `frames_to_generate` | INT — number of new frames the model needs to produce (the white/grey region). |
## Mode Reference
All diagrams show the `mask` and `control_frames` layout left-to-right (frame 0 → frame N).
---
### End Extend
Generate new frames **after** the source clip.
- **`split_index`** — optional trim: `0` keeps the full clip; a negative value (e.g. `-16`) drops that many frames from the end before extending.
- **`frames_to_generate`** = `target_frames source_frames`
```
mask: [ BLACK × source ][ WHITE × generated ]
control_frames: [ source clip ][ GREY × generated ]
```
---
### Pre Extend
Generate new frames **before** a reference portion of the source clip.
- **`split_index`** — how many frames from the start to keep as the reference tail (e.g. `24`).
- **`frames_to_generate`** = `target_frames split_index`
```
mask: [ WHITE × generated ][ BLACK × reference ]
control_frames: [ GREY × generated ][ reference frames ]
```
---
### Middle Extend
Generate new frames **between** two halves of the source clip, split at `split_index`.
- **`split_index`** — frame index where the source is split.
- **`frames_to_generate`** = `target_frames source_frames`
```
mask: [ BLACK × part_a ][ WHITE × generated ][ BLACK × part_b ]
control_frames: [ part_a ][ GREY × generated ][ part_b ]
```
---
### Edge Extend
Generate a transition **between the end and start** of a clip (useful for looping).
- **`edge_frames`** — number of frames taken from each edge.
- **`split_index`** — unused.
- **`frames_to_generate`** = `target_frames (2 × edge_frames)`
The end segment is placed first, then the generated gap, then the start segment — so the model learns to connect the clip's end back to its beginning.
```
mask: [ BLACK × end_seg ][ WHITE × generated ][ BLACK × start_seg ]
control_frames: [ end_seg ][ GREY × generated ][ start_seg ]
```
---
### Join Extend
Heal/blend **two halves** of a clip (or two separate clips) together. By default, the source is split in half; `edge_frames` from each side of the split form the context. If `source_clip_2` is connected (via VACE Source Prep), the two clips are joined directly instead.
- **`edge_frames`** — context frames taken from each side of the join point.
- **`split_index`** — unused.
- **`frames_to_generate`** = `target_frames (2 × edge_frames)`
```
Single clip: [ part_1 ][ part_2 | part_3 ][ part_4 ]
← edge → ← edge →
Two clips: clip_1: [...| part_2 ] clip_2: [ part_3 |...]
← edge → ← edge →
mask: [ BLACK × part_2 ][ WHITE × generated ][ BLACK × part_3 ]
control_frames: [ part_2 ][ GREY × generated ][ part_3 ]
```
---
### Bidirectional Extend
Generate new frames **both before and after** the source clip.
- **`split_index`** — number of generated frames to place before the clip. `0` = even split (half before, half after).
- **`target_frames`** — total output frame count.
- **`frames_to_generate`** = `target_frames source_frames`
```
mask: [ WHITE × pre ][ BLACK × source ][ WHITE × post ]
control_frames: [ GREY × pre ][ source clip ][ GREY × post ]
```
---
### Frame Interpolation
Insert generated frames **between each consecutive pair** of source frames.
- **`split_index`** — number of new frames to insert per gap (min 1). `target_frames` is unused.
- **`frames_to_generate`** = `(source_frames 1) × split_index`
- **Total output** = `source_frames + frames_to_generate`
```
mask: [ B ][ W×step ][ B ][ W×step ][ B ] ...
control_frames: [ f0][ GREY ][ f1][ GREY ][ f2] ...
```
---
### Replace/Inpaint
Regenerate a range of frames **in-place** within the source clip.
- **`split_index`** — start index of the region to replace (clamped to source length).
- **`edge_frames`** — number of frames to replace (clamped to remaining frames after start).
- **`frames_to_generate`** = `edge_frames` (after clamping). `target_frames` is unused.
- **Total output** = `source_frames` (same length — in-place replacement).
```
mask: [ BLACK × before ][ WHITE × replace ][ BLACK × after ]
control_frames: [ before frames ][ GREY × replace ][ after frames ]
```
---
### Video Inpaint
Regenerate **spatial regions** within frames using a per-pixel mask. Unlike other modes that work at the frame level (entire frames kept or generated), Video Inpaint operates at the pixel level — masked regions are regenerated while the rest of each frame is preserved.
- **`inpaint_mask`** *(required)* — a `MASK` (B, H, W) where white (1.0) marks regions to regenerate and black (0.0) marks regions to keep. A single-frame mask is automatically broadcast to all source frames; a multi-frame mask must have the same frame count as `source_clip`.
- **`target_frames`**, **`split_index`**, **`edge_frames`** — unused.
- **`frames_to_generate`** = `source_frames` (all frames are partially regenerated).
- **Total output** = `source_frames` (same length — in-place spatial replacement).
Compositing formula per pixel:
```
control_frames = source × (1 mask) + grey × mask
```
```
mask: [ per-pixel mask broadcast to (B, H, W, 3) ]
control_frames: [ source pixels where mask=0, grey where mask=1 ]
```
---
### Keyframe
Place keyframe images at specific positions within a `target_frames`-length output, and generate everything between them.
- **`source_clip`** — a small batch of keyframe images (e.g. 4 frames).
- **`target_frames`** — total output frame count.
- **`keyframe_positions`** *(optional)* — comma-separated frame indices (e.g. `"0,20,50,80"`). Must have one value per source frame, sorted ascending, no duplicates, all within [0, target_frames-1]. Leave empty for **auto-spread** (first keyframe at frame 0, last at `target_frames-1`, others evenly distributed).
- **`split_index`**, **`edge_frames`** — unused.
- **`frames_to_generate`** = `target_frames source_frames`
- **Total output** = `target_frames`
```
Example: 4 keyframes, target_frames=81, positions auto-spread to 0,27,53,80
mask: [ B ][ W×26 ][ B ][ W×25 ][ B ][ W×26 ][ B ]
control_frames: [ k0][ GREY ][ k1][ GREY ][ k2][ GREY ][ k3]
```
---
## Node: VACE Merge Back
Splices VACE sampler output back into the original full-length video. Connect the original (untrimmed) clip, the VACE sampler output, and the `vace_pipe` from VACE Source Prep. The pipe carries mode, trim bounds, and context frame counts for automatic blending.
Irrelevant widgets are automatically hidden based on the selected blend method.
### Inputs
| Input | Type | Default | Description |
|---|---|---|---|
| `source_clip` | IMAGE | — | Full original video (before any trimming). Same source as VACE Source Prep's source_clip. |
| `vace_output` | IMAGE | — | VACE sampler output. |
| `vace_pipe` | VACE_PIPE | — | Pipe from VACE Source Prep carrying mode, trim bounds, and context counts. |
| `blend_method` | ENUM | `optical_flow` | `none` (hard cut), `alpha` (linear crossfade), or `optical_flow` (motion-compensated). |
| `of_preset` | ENUM | `balanced` | Optical flow quality: `fast`, `balanced`, `quality`, `max`. |
| `source_clip_2` | IMAGE | *(optional)* | Second original clip for Join Extend with two separate clips. |
### Outputs
| Output | Type | Description |
|---|---|---|
| `merged_clip` | IMAGE | Full reconstructed video. |
### Behavior
**Pass-through modes** (Edge Extend, Frame Interpolation, Keyframe, Video Inpaint): returns `vace_output` as-is — the VACE output IS the final result for these modes.
**Splice modes** (End, Pre, Middle, Join, Bidirectional, Replace): reconstructs `source_clip[:trim_start] + vace_output + source_clip[trim_end:]`, then blends across the full context zones at each seam. For two-clip Join Extend, the tail comes from `source_clip_2` instead.
Context frame counts (`left_ctx`, `right_ctx`) are carried in the `vace_pipe` and determined automatically by VACE Source Prep based on the mode and input_left/input_right settings. Blending uses a smooth alpha ramp across the entire context zone. Optical flow blending warps both frames along the motion field before blending, reducing ghosting on moving subjects.
### Example: Middle Extend
```
Original: 274 frames (0273)
Prep: split_index=137, input_left=16, input_right=16
→ vace_pipe: trim_start=121, trim_end=153, left_ctx=16, right_ctx=16
Mask Gen: target_frames=81
→ mask = [BLACK×16] [WHITE×49] [BLACK×16]
VACE out: 81 frames (from sampler)
Merge: result = original[0:121] + vace[0:81] + original[153:274]
→ 121 + 81 + 121 = 323 frames
Left blend: vace[0..15] ↔ original[121..136] (full 16-frame context zone)
Right blend: vace[65..80] ↔ original[137..152] (full 16-frame context zone)
```
### Wiring Diagram
```
[Load Video]
├─ source_clip ──→ [VACESourcePrep] ─┬─ trimmed_clip ──→ [MaskGen] ─→ [Sampler]
│ ├─ mode ───────────→ [MaskGen] │
│ └─ vace_pipe ──────────────────┐ │
│ │ │
└─ source_clip ───────────────────────────────────────→ [VACEMergeBack] ←┘
vace_output
```
---
## Node: WanVideo Save Merged Model
Saves a WanVideo diffusion model (with merged LoRAs) as a `.safetensors` file. Found under the **WanVideoWrapper** category.
### Inputs
| Input | Type | Default | Description |
|---|---|---|---|
| `model` | WANVIDEOMODEL | — | WanVideo model with merged LoRA from the WanVideo Model Loader. |
| `filename_prefix` | STRING | `merged_wanvideo` | Filename prefix for the saved file. A numeric suffix is appended to avoid overwriting. |
| `save_dtype` | ENUM | `same` | Cast weights before saving: `same`, `bf16`, `fp16`, or `fp32`. Set explicitly if the model was loaded in fp8. |
| `custom_path` | STRING | *(optional)* | Absolute path to save directory. Leave empty to save in `ComfyUI/models/diffusion_models/`. |
### Behavior
- Extracts the diffusion model state dict and saves it in safetensors format.
- Records source model name and merged LoRA details (names + strengths) in file metadata for traceability.
- Clones all tensors before saving to handle shared/aliased weights safely.
- Automatically avoids overwriting existing files by appending `_1`, `_2`, etc.
---
## Node: Save Latent (Absolute Path)
Saves a LATENT to an absolute file path as `.latent` (safetensors format). Found under the **latent** category.
### Inputs
| Input | Type | Default | Description |
|---|---|---|---|
| `samples` | LATENT | — | Latent samples to save. |
| `path` | STRING | `/path/to/latent.latent` | Absolute file path. `.latent` extension is appended if missing. |
| `overwrite` | BOOLEAN | `False` | If false, appends `_1`, `_2`, etc. to avoid overwriting. |
### Outputs
| Output | Description |
|---|---|
| `LATENT` | Pass-through of the input samples (for chaining). |
### Behavior
- Saves all tensor data via safetensors, with device info and non-tensor metadata stored in the file header.
- Creates parent directories automatically.
---
## Node: Load Latent (Absolute Path)
Loads a LATENT from an absolute file path. Found under the **latent** category.
### Inputs
| Input | Type | Default | Description |
|---|---|---|---|
| `path` | STRING | `/path/to/latent.latent` | Absolute path to a `.latent` file previously saved by Save Latent. |
### Outputs
| Output | Description |
|---|---|
| `LATENT` | Restored latent samples with original devices and non-tensor data. |
## Dependencies
- **PyTorch** and **safetensors** — bundled with ComfyUI.
- **OpenCV** (`cv2`) — optional, for optical flow blending in VACE Merge Back. Falls back to alpha blending if unavailable.