Go to file

Ethanfel 87ec7b3938 Add Video Inpaint mode for per-pixel spatial mask regeneration

New 9th mode that works at the pixel level rather than the frame level.
Accepts an optional MASK input (B,H,W) to mark spatial regions for
regeneration, with single-frame broadcast, spatial dimension validation,
and contiguous output tensors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-19 13:37:04 +01:00

__init__.py

Initial commit: VACE Mask Generator custom node

2026-02-19 13:06:42 +01:00

.gitignore

Initial commit: VACE Mask Generator custom node

2026-02-19 13:06:42 +01:00

logic.json

Initial commit: VACE Mask Generator custom node

2026-02-19 13:06:42 +01:00

nodes.py

Add Video Inpaint mode for per-pixel spatial mask regeneration

2026-02-19 13:37:04 +01:00

README.md

Add Video Inpaint mode for per-pixel spatial mask regeneration

2026-02-19 13:37:04 +01:00

README.md

ComfyUI-VACE-Tools

A single ComfyUI node that replaces ~149 manually wired nodes for generating VACE mask and control-frame sequences.

Installation

cd ComfyUI/custom_nodes/
git clone https://github.com/your-user/ComfyUI-VACE-Tools.git

Restart ComfyUI. The node appears under the VACE Tools category.

Node: VACE Mask Generator

Inputs

Input	Type	Default	Description
`source_clip`	IMAGE	—	Source video frames (B, H, W, C tensor)
`mode`	ENUM	`End Extend`	Generation mode (see below). 9 modes available.
`target_frames`	INT	`81`	Total output frame count for mask and control_frames (1–10000). Unused by Frame Interpolation, Replace/Inpaint, and Video Inpaint.
`split_index`	INT	`0`	Where to split the source. Meaning varies by mode. Unused by Edge/Join. Bidirectional: frames before clip (0 = even split). Frame Interpolation: new frames per gap. Replace/Inpaint: start index of replace region.
`edge_frames`	INT	`8`	Number of edge frames for Edge and Join modes. Replace/Inpaint: number of frames to replace. Unused by End/Pre/Middle/Bidirectional/Frame Interpolation/Video Inpaint.
`inpaint_mask`	MASK	(optional)	Spatial inpaint mask for Video Inpaint mode (B, H, W). White (1.0) = regenerate, Black (0.0) = keep. Single frame broadcasts to all source frames.

Outputs

Output	Description
`mask`	Black/white frame sequence (`target_frames` long). Black = keep, White = generate.
`control_frames`	Source frames composited with grey (`#7f7f7f`) fill (`target_frames` long). Fed to VACE as visual reference.
`segment_1`–`segment_4`	Clip segments whose contents depend on the mode (see below). Unused segments are 1-frame black placeholders.
`frames_to_generate`	INT — number of new frames the model needs to produce (the white/grey region).

Mode Reference

All diagrams show the mask and control_frames layout left-to-right (frame 0 → frame N).

End Extend

Generate new frames after the source clip.

split_index — optional trim: 0 keeps the full clip; a negative value (e.g. -16) drops that many frames from the end before extending.
frames_to_generate = target_frames − source_frames

mask:           [ BLACK × source ][ WHITE × generated ]
control_frames: [ source clip    ][ GREY  × generated ]

Segment	Content
`segment_1`	Source frames (trimmed if `split_index ≠ 0`)
`segment_2`–`4`	Placeholder

Pre Extend

Generate new frames before a reference portion of the source clip.

split_index — how many frames from the start to keep as the reference tail (e.g. 24).
frames_to_generate = target_frames − split_index

mask:           [ WHITE × generated ][ BLACK × reference ]
control_frames: [ GREY  × generated ][ reference frames  ]

Segment	Content
`segment_1`	Remaining frames after the reference (source[split_index:])
`segment_2`–`4`	Placeholder

Middle Extend

Generate new frames between two halves of the source clip, split at split_index.

split_index — frame index where the source is split.
frames_to_generate = target_frames − source_frames

mask:           [ BLACK × part_a ][ WHITE × generated ][ BLACK × part_b ]
control_frames: [ part_a         ][ GREY  × generated ][ part_b         ]

Segment	Content
`segment_1`	Part A — source[:split_index]
`segment_2`	Part B — source[split_index:]
`segment_3`–`4`	Placeholder

Edge Extend

Generate a transition between the end and start of a clip (useful for looping).

edge_frames — number of frames taken from each edge.
split_index — unused.
frames_to_generate = target_frames − (2 × edge_frames)

The end segment is placed first, then the generated gap, then the start segment — so the model learns to connect the clip's end back to its beginning.

mask:           [ BLACK × end_seg ][ WHITE × generated ][ BLACK × start_seg ]
control_frames: [ end_seg         ][ GREY  × generated ][ start_seg         ]

Segment	Content
`segment_1`	Start edge — source[:edge_frames]
`segment_2`	Middle remainder — source[edge_frames:−edge_frames]
`segment_3`	End edge — source[−edge_frames:]
`segment_4`	Placeholder

Join Extend

Heal/blend two halves of a clip together. The source is split in half; edge_frames from each side of the split form the context.

edge_frames — context frames taken from each side of the midpoint.
split_index — unused.
frames_to_generate = target_frames − (2 × edge_frames)

source layout:  [ part_1 ][ part_2 | part_3 ][ part_4 ]
                           ← edge →  ← edge →

mask:           [ BLACK × part_2 ][ WHITE × generated ][ BLACK × part_3 ]
control_frames: [ part_2         ][ GREY  × generated ][ part_3         ]

Segment	Content
`segment_1`	Part 1 — first half minus its trailing edge
`segment_2`	Part 2 — trailing edge of first half
`segment_3`	Part 3 — leading edge of second half
`segment_4`	Part 4 — second half minus its leading edge

Bidirectional Extend

Generate new frames both before and after the source clip.

split_index — number of generated frames to place before the clip. 0 = even split (half before, half after).
target_frames — total output frame count.
frames_to_generate = target_frames − source_frames

mask:           [ WHITE × pre ][ BLACK × source ][ WHITE × post ]
control_frames: [ GREY  × pre ][ source clip    ][ GREY  × post ]

Segment	Content
`segment_1`	Full source clip
`segment_2`–`4`	Placeholder

Frame Interpolation

Insert generated frames between each consecutive pair of source frames.

split_index — number of new frames to insert per gap (min 1). target_frames is unused.
frames_to_generate = (source_frames − 1) × split_index
Total output = source_frames + frames_to_generate

mask:           [ B ][ W×step ][ B ][ W×step ][ B ] ...
control_frames: [ f0][ GREY   ][ f1][ GREY   ][ f2] ...

Segment	Content
`segment_1`	Full source clip
`segment_2`–`4`	Placeholder

Replace/Inpaint

Regenerate a range of frames in-place within the source clip.

split_index — start index of the region to replace (clamped to source length).
edge_frames — number of frames to replace (clamped to remaining frames after start).
frames_to_generate = edge_frames (after clamping). target_frames is unused.
Total output = source_frames (same length — in-place replacement).

mask:           [ BLACK × before ][ WHITE × replace ][ BLACK × after ]
control_frames: [ before frames  ][ GREY  × replace ][ after frames  ]

Segment	Content
`segment_1`	Before — source[:start]
`segment_2`	Original replaced frames — source[start:start+length]
`segment_3`	After — source[start+length:]
`segment_4`	Placeholder

Video Inpaint

Regenerate spatial regions within frames using a per-pixel mask. Unlike other modes that work at the frame level (entire frames kept or generated), Video Inpaint operates at the pixel level — masked regions are regenerated while the rest of each frame is preserved.

inpaint_mask (required) — a MASK (B, H, W) where white (1.0) marks regions to regenerate and black (0.0) marks regions to keep. A single-frame mask is automatically broadcast to all source frames; a multi-frame mask must have the same frame count as source_clip.
target_frames, split_index, edge_frames — unused.
frames_to_generate = source_frames (all frames are partially regenerated).
Total output = source_frames (same length — in-place spatial replacement).

Compositing formula per pixel:

control_frames = source × (1 − mask) + grey × mask

mask:           [ per-pixel mask broadcast to (B, H, W, 3)        ]
control_frames: [ source pixels where mask=0, grey where mask=1   ]

Segment	Content
`segment_1`	Full source clip
`segment_2`–`4`	Placeholder

Dependencies

None beyond PyTorch, which is bundled with ComfyUI.

README.md Unescape Escape

ComfyUI-VACE-Tools

Installation

Node: VACE Mask Generator

Inputs

Outputs

Mode Reference

End Extend

Pre Extend

Middle Extend

Edge Extend

Join Extend

Bidirectional Extend

Frame Interpolation

Replace/Inpaint

Video Inpaint

Dependencies

README.md