Files
ComfyUI-Dataset-Gates/docs/plans/2026-06-21-grid-image-pool-design.md
Ethanfel 4f0b18134e Add Image Pool (Grid) node design doc
Approved design for an input-side ComfyUI node holding a curated pool of
images with per-image masks and labels, selectable for output. Captures
IO, managed-pool-folder storage, MaskEditor reuse, server routes, edge
cases, and a 3-phase build plan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 12:41:57 +02:00

130 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Image Pool (Grid) — Design
Date: 2026-06-21
Status: Approved (brainstorming complete, ready for implementation plan)
## 1. Purpose & scope
An **input-side** ComfyUI node that holds a curated *pool* of images, each with its
own remembered mask and an editable label. The user picks an active image (or drives
it by index) and the node outputs that image + mask (+ index/count/label).
The node does **not** inpaint. It feeds downstream edit nodes (Klein / Flux Kontext).
It exists to remove two recurring annoyances in the dataset/inpaint workflow:
1. **Rewiring** — instead of swapping `LoadImage` nodes, keep one node holding many
images and click the active one. Downstream wiring never changes.
2. **Re-masking** — each image remembers its own mask, so switching back to an image
never means redrawing the mask.
Node name: **`Image Pool (Grid)`**.
### Non-goals (YAGNI for v1)
- Bulk folder load
- Batch-output mode (output the whole pool at once)
- Drag-to-reorder (deferred to phase 3)
- Cross-machine workflow portability (pool is disk-backed, local)
## 2. Node IO
| dir | name | type | notes |
|---|---|---|---|
| out | `IMAGE` | IMAGE | selected slot's image |
| out | `MASK` | MASK | selected slot's mask; all-zeros if none drawn (= nothing masked) |
| out | `index` | INT | the slot actually used |
| out | `count` | INT | total images in pool |
| out | `label` | STRING | selected slot's label ("" if unset) |
| widget | `index` | INT | `-1` = use grid-clicked active slot; `>=0` = force that slot (clamped to `0..count-1`) |
| hidden | `pool_id` | STRING | stable UUID, generated on node create, saved in workflow |
Selection rule: if `index` widget `== -1`, use `manifest.active`; else use
`clamp(index, 0, count-1)`. The actually-used index is echoed on the `index` output.
## 3. Storage — managed pool folder
```
input/grid_pool/<pool_id>/
manifest.json
img_0001.png img_0001.mask.png
img_0002.png ...
```
`manifest.json`:
```json
{
"active": 0,
"slots": [
{ "image": "img_0001.png", "mask": "img_0001.mask.png", "label": "", "added": 1718960000 }
]
}
```
- Workflow JSON stores only `pool_id` + the `index` widget → stays tiny.
- Pool survives restart (lives in ComfyUI's `input/`).
- `mask` may be `null`/absent until a mask is drawn.
## 4. Components & data flow
### Python node (`__init__.py`)
- On execute: resolve pool dir from `pool_id`, read `manifest.json`, choose slot,
load image + mask → tensors, return `(IMAGE, MASK, index, count, label)`.
- Empty pool → return a 1x1 black image + zero mask + `count=0` (never crash the graph).
- `IS_CHANGED` returns a hash of `(pool_id, chosen_index, image_mtime, mask_mtime)` so
that editing a mask or replacing an image forces re-execution (otherwise ComfyUI
caches the output and the new mask is never seen).
- Mask convention: load as single-channel float; if no mask file, emit zeros matching
image H×W.
### JS extension (`web/`)
A resizable **in-node DOM widget** rendering the thumbnail grid (scrollable so a big
pool doesn't blow up the node). Responsibilities:
- **Ingest**: paste (Ctrl+V on node), drag-drop files, upload button → POST to
`/grid_pool/add`; server copies into the pool, appends a slot, returns manifest.
- **Select**: click thumbnail → `/grid_pool/active`, highlight active.
- **Mask**: brush button / double-click → push the slot image into `ComfyApp.clipspace`
(`imgs`/`images`/`selectedIndex`), set `ComfyApp.clipspace_return_node = node`, call
`openMaskEditor()`. The editor saves the alpha mask via `/upload/mask`; on return the
node's `pasteFromClipspace()` fires → we extract the alpha and POST it to
`/grid_pool/set_mask` to write the slot's `.mask.png`.
- **Label**: inline-editable caption under each thumbnail → `/grid_pool/label`.
- **Delete**: ✕ on thumbnail → `/grid_pool/remove`.
- Badges: active border, "has-mask" dot, slot index.
### Server routes (aiohttp, registered from Python)
Under `/grid_pool/*`: `add`, `remove`, `active`, `set_mask`, `label`, `list`.
All mutate `manifest.json` atomically and return the updated manifest.
## 5. UI approach
Chosen: **in-node grid** (thumbnails in the node body; resize node to see more,
scroll for large pools). Rejected alternative: modal "manage pool" gallery — better for
huge pools but more clicks and more UI code; revisit only if pools get large.
## 6. Edge cases & error handling
- Empty pool → 1x1 black image + zero mask + `count=0`.
- `index >= count` → clamp; echo the clamped value on `index` output.
- Missing/corrupt manifest → rebuild from files on disk.
- Cloning a node copies `pool_id` → both nodes share one pool. Provide right-click
**"Detach pool (new id)"** to split. v1 may just document the behavior.
- MaskEditor integration verified against the installed frontend: `openMaskEditor`,
`copyToClipspace`/`pasteFromClipspace`, `clipspace_return_node`, and `/upload/mask`
all exist. This is the standard "Open in MaskEditor" pattern used by many nodes.
## 7. Phasing & testing
- **Phase 1** — storage + manifest + Python node + server routes + grid display +
select + delete + labels (no masking). E2E: add images, pick one, it outputs
image/mask(zeros)/index/count/label.
- **Phase 2** — MaskEditor integration + per-slot mask persistence + `IS_CHANGED`.
- **Phase 3** — polish: drag-reorder, "detach pool", badges.
Testing:
- pytest (Python): manifest read/write, atomic mutation, slot selection rule, tensor
shapes/dtypes, zero-mask fallback, `IS_CHANGED` hashing, manifest rebuild.
- Manual checklist (JS/MaskEditor): ingest paths, select, mask round-trip, label edit,
delete, persistence across restart.