Add standalone CLI for memory-efficient video upscaling

Standalone inference script that works outside ComfyUI — just activate the same Python venv. Streams output frames to ffmpeg so peak RAM stays bounded regardless of video length. Supports video files, image sequences, and single images. Audio is automatically preserved from input videos. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:46:58 +01:00
parent 8794f8ddec
commit 6cf314baf4
2 changed files with 594 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -9,7 +9,9 @@ ComfyUI custom nodes for [STAR (Spatial-Temporal Augmentation with Text-to-Video
 - **Auto-download**: all models (UNet checkpoint, OpenCLIP text encoder, temporal VAE) download automatically on first use
 - **VRAM offloading**: three modes to fit GPUs from 12GB to 40GB+
 - **Long video support**: sliding-window chunking with 50% overlap
+- **Segment-based processing**: bound peak RAM for long videos
 - **Color correction**: AdaIN and wavelet-based post-processing
+- **Standalone CLI**: run from the terminal without ComfyUI for long videos

 ## Installation

@@ -83,6 +85,31 @@ Models are stored in `ComfyUI/models/star/` and auto-downloaded on first use:

 The OpenCLIP text encoder and SVD temporal VAE are downloaded automatically by their respective libraries on first load.

+## Standalone CLI
+
+For long videos where ComfyUI's RAM usage becomes a bottleneck, use the standalone script directly. It streams output frames to ffmpeg so peak memory stays bounded regardless of video length.
+
+```bash
+# Activate your ComfyUI Python environment, then:
+python inference.py input.mp4 -o output.mp4
+
+# With model offloading for lower VRAM
+python inference.py input.mp4 -o output.mp4 --offload model --segment-size 8
+
+# Image sequence input/output
+python inference.py frames_in/ -o frames_out/
+
+# Image sequence to video
+python inference.py frames_in/ -o output.mp4 --fps 24
+
+# Single image
+python inference.py photo.png -o photo_4x.png
+```
+
+Audio is automatically copied from the input video. Use `--no-audio` to disable.
+
+Run `python inference.py --help` for all options.
+
 ## Credits

 - [STAR](https://github.com/NJU-PCALab/STAR) by Rui Xie, Yinhong Liu et al. (Nanjing University) — ICCV 2025