docs: document mask inputs and normalize toggle in README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 10:43:42 +02:00
parent f28759f1e3
commit b519b042e2
1 changed files with 11 additions and 0 deletions
@@ -35,11 +35,21 @@ Extracts CLIP visual features and text-guided sync features from a video. Result
 | `fps` | Source fps — ignored if `video_info` is connected |
 | `duration` | Override clip duration in seconds. `0` = infer from video length |
 | `cache_dir` | Directory for cached `.npz` files. Empty = system temp dir |
+| `mask` | *(optional)* Segmentation mask `[T,H,W]` float [0,1] — static (1 frame) or per-frame |
+| `mask_strength` | Background suppression strength. `1.0` = full neutral fill, `0.0` = no effect |
+| `mask_clip` | Apply mask to CLIP features (384px path). Disable to let CLIP see the full scene |
+| `mask_sync` | Apply mask to TextSynchformer sync features (224px path) |

 **Outputs:** `features` (SELVA_FEATURES), `fps` (FLOAT), `prompt` (STRING)

 Connect `prompt` output to the Sampler's `prompt` input to avoid entering it twice.

+#### Masking
+
+Connect a segmentation mask (SAM2, Grounding DINO+SAM, or any ComfyUI mask node) to isolate a specific object's motion before encoding. Background pixels are filled with a neutral value (0.5) rather than zeroed — this keeps them in-distribution for CLIP and maps to exactly 0 after sync's `[-1,1]` normalization, minimising the influence of background motion on the generated audio.
+
+Use `mask_sync=true, mask_clip=false` if you want sync features focused on the target object while CLIP still sees the full scene for broader context. Changing any mask parameter correctly busts the feature cache.
+
 ---

 ### SelVA Sampler
@@ -56,6 +66,7 @@ Generates audio from video features. Runs the rectified flow ODE with classifier
 | `steps` | Sampling steps (default: 25) |
 | `cfg_strength` | Classifier-free guidance scale (default: 4.5) |
 | `seed` | RNG seed |
+| `normalize` | Peak-normalize output to [-1, 1] (default: true) |

 **Output:** `AUDIO`