ComfyUI-VACE-Tools

A single ComfyUI node that replaces ~149 manually wired nodes for generating VACE mask and control-frame sequences.

Installation

cd ComfyUI/custom_nodes/
git clone https://github.com/your-user/ComfyUI-VACE-Tools.git

Restart ComfyUI. The node appears under the VACE Tools category.

Node: VACE Mask Generator

Inputs

Input Type Default Description
source_clip IMAGE Source video frames (B, H, W, C tensor)
mode ENUM End Extend Generation mode (see below). 9 modes available.
target_frames INT 81 Total output frame count for mask and control_frames (110000). Unused by Frame Interpolation, Replace/Inpaint, and Video Inpaint.
split_index INT 0 Where to split the source. Meaning varies by mode. Unused by Edge/Join. Bidirectional: frames before clip (0 = even split). Frame Interpolation: new frames per gap. Replace/Inpaint: start index of replace region.
edge_frames INT 8 Number of edge frames for Edge and Join modes. Replace/Inpaint: number of frames to replace. Unused by End/Pre/Middle/Bidirectional/Frame Interpolation/Video Inpaint.
inpaint_mask MASK (optional) Spatial inpaint mask for Video Inpaint mode (B, H, W). White (1.0) = regenerate, Black (0.0) = keep. Single frame broadcasts to all source frames.

Outputs

Output Description
mask Black/white frame sequence (target_frames long). Black = keep, White = generate.
control_frames Source frames composited with grey (#7f7f7f) fill (target_frames long). Fed to VACE as visual reference.
segment_1segment_4 Clip segments whose contents depend on the mode (see below). Unused segments are 1-frame black placeholders.
frames_to_generate INT — number of new frames the model needs to produce (the white/grey region).

Mode Reference

All diagrams show the mask and control_frames layout left-to-right (frame 0 → frame N).


End Extend

Generate new frames after the source clip.

  • split_index — optional trim: 0 keeps the full clip; a negative value (e.g. -16) drops that many frames from the end before extending.
  • frames_to_generate = target_frames source_frames
mask:           [ BLACK × source ][ WHITE × generated ]
control_frames: [ source clip    ][ GREY  × generated ]
Segment Content
segment_1 Source frames (trimmed if split_index ≠ 0)
segment_24 Placeholder

Pre Extend

Generate new frames before a reference portion of the source clip.

  • split_index — how many frames from the start to keep as the reference tail (e.g. 24).
  • frames_to_generate = target_frames split_index
mask:           [ WHITE × generated ][ BLACK × reference ]
control_frames: [ GREY  × generated ][ reference frames  ]
Segment Content
segment_1 Remaining frames after the reference (source[split_index:])
segment_24 Placeholder

Middle Extend

Generate new frames between two halves of the source clip, split at split_index.

  • split_index — frame index where the source is split.
  • frames_to_generate = target_frames source_frames
mask:           [ BLACK × part_a ][ WHITE × generated ][ BLACK × part_b ]
control_frames: [ part_a         ][ GREY  × generated ][ part_b         ]
Segment Content
segment_1 Part A — source[:split_index]
segment_2 Part B — source[split_index:]
segment_34 Placeholder

Edge Extend

Generate a transition between the end and start of a clip (useful for looping).

  • edge_frames — number of frames taken from each edge.
  • split_index — unused.
  • frames_to_generate = target_frames (2 × edge_frames)

The end segment is placed first, then the generated gap, then the start segment — so the model learns to connect the clip's end back to its beginning.

mask:           [ BLACK × end_seg ][ WHITE × generated ][ BLACK × start_seg ]
control_frames: [ end_seg         ][ GREY  × generated ][ start_seg         ]
Segment Content
segment_1 Start edge — source[:edge_frames]
segment_2 Middle remainder — source[edge_frames:edge_frames]
segment_3 End edge — source[edge_frames:]
segment_4 Placeholder

Join Extend

Heal/blend two halves of a clip together. The source is split in half; edge_frames from each side of the split form the context.

  • edge_frames — context frames taken from each side of the midpoint.
  • split_index — unused.
  • frames_to_generate = target_frames (2 × edge_frames)
source layout:  [ part_1 ][ part_2 | part_3 ][ part_4 ]
                           ← edge →  ← edge →

mask:           [ BLACK × part_2 ][ WHITE × generated ][ BLACK × part_3 ]
control_frames: [ part_2         ][ GREY  × generated ][ part_3         ]
Segment Content
segment_1 Part 1 — first half minus its trailing edge
segment_2 Part 2 — trailing edge of first half
segment_3 Part 3 — leading edge of second half
segment_4 Part 4 — second half minus its leading edge

Bidirectional Extend

Generate new frames both before and after the source clip.

  • split_index — number of generated frames to place before the clip. 0 = even split (half before, half after).
  • target_frames — total output frame count.
  • frames_to_generate = target_frames source_frames
mask:           [ WHITE × pre ][ BLACK × source ][ WHITE × post ]
control_frames: [ GREY  × pre ][ source clip    ][ GREY  × post ]
Segment Content
segment_1 Full source clip
segment_24 Placeholder

Frame Interpolation

Insert generated frames between each consecutive pair of source frames.

  • split_index — number of new frames to insert per gap (min 1). target_frames is unused.
  • frames_to_generate = (source_frames 1) × split_index
  • Total output = source_frames + frames_to_generate
mask:           [ B ][ W×step ][ B ][ W×step ][ B ] ...
control_frames: [ f0][ GREY   ][ f1][ GREY   ][ f2] ...
Segment Content
segment_1 Full source clip
segment_24 Placeholder

Replace/Inpaint

Regenerate a range of frames in-place within the source clip.

  • split_index — start index of the region to replace (clamped to source length).
  • edge_frames — number of frames to replace (clamped to remaining frames after start).
  • frames_to_generate = edge_frames (after clamping). target_frames is unused.
  • Total output = source_frames (same length — in-place replacement).
mask:           [ BLACK × before ][ WHITE × replace ][ BLACK × after ]
control_frames: [ before frames  ][ GREY  × replace ][ after frames  ]
Segment Content
segment_1 Before — source[:start]
segment_2 Original replaced frames — source[start:start+length]
segment_3 After — source[start+length:]
segment_4 Placeholder

Video Inpaint

Regenerate spatial regions within frames using a per-pixel mask. Unlike other modes that work at the frame level (entire frames kept or generated), Video Inpaint operates at the pixel level — masked regions are regenerated while the rest of each frame is preserved.

  • inpaint_mask (required) — a MASK (B, H, W) where white (1.0) marks regions to regenerate and black (0.0) marks regions to keep. A single-frame mask is automatically broadcast to all source frames; a multi-frame mask must have the same frame count as source_clip.
  • target_frames, split_index, edge_frames — unused.
  • frames_to_generate = source_frames (all frames are partially regenerated).
  • Total output = source_frames (same length — in-place spatial replacement).

Compositing formula per pixel:

control_frames = source × (1  mask) + grey × mask
mask:           [ per-pixel mask broadcast to (B, H, W, 3)        ]
control_frames: [ source pixels where mask=0, grey where mask=1   ]
Segment Content
segment_1 Full source clip
segment_24 Placeholder

Dependencies

None beyond PyTorch, which is bundled with ComfyUI.

Description
No description provided
Readme MIT 309 KiB
Languages
Python 89.1%
JavaScript 10.9%