Files

Ethanfel 6cf314baf4 Add standalone CLI for memory-efficient video upscaling

Standalone inference script that works outside ComfyUI — just activate
the same Python venv. Streams output frames to ffmpeg so peak RAM stays
bounded regardless of video length. Supports video files, image
sequences, and single images. Audio is automatically preserved from
input videos.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-14 23:46:58 +01:00

5.2 KiB

Raw Blame History

ComfyUI-STAR

ComfyUI custom nodes for STAR (Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution) — a diffusion-based video upscaling model (ICCV 2025).

Features

Diffusion-based 4x video super-resolution with temporal coherence
Two model variants: light_deg.pt (light degradation) and heavy_deg.pt (heavy degradation)
Auto-download: all models (UNet checkpoint, OpenCLIP text encoder, temporal VAE) download automatically on first use
VRAM offloading: three modes to fit GPUs from 12GB to 40GB+
Long video support: sliding-window chunking with 50% overlap
Segment-based processing: bound peak RAM for long videos
Color correction: AdaIN and wavelet-based post-processing
Standalone CLI: run from the terminal without ComfyUI for long videos

Installation

ComfyUI Manager

Search for ComfyUI-STAR in ComfyUI Manager and install.

Manual

cd ComfyUI/custom_nodes
git clone --recursive https://github.com/ethanfel/Comfyui-STAR.git
cd Comfyui-STAR
pip install -r requirements.txt

The --recursive flag clones the STAR submodule. If you forgot it, run git submodule update --init afterwards.

Example Workflow

A ready-to-use workflow is included in example_workflows/star_basic_workflow.json. Drag and drop it onto the ComfyUI canvas to load it, then select your input image or connect a video loader.

Nodes

STAR Model Loader

Loads the STAR model components (UNet+ControlNet, OpenCLIP text encoder, temporal VAE).

Input	Description
model_name	`light_deg.pt` for mildly degraded video, `heavy_deg.pt` for heavily degraded video. Auto-downloaded from HuggingFace on first use.
precision	`fp16` (recommended), `bf16`, or `fp32`.
offload	`disabled` (~39GB VRAM), `model` (~16GB — swaps components to CPU when idle), `aggressive` (~12GB — model offload + single-frame VAE decode).

STAR Video Super-Resolution

Runs the STAR diffusion pipeline on an image batch.

Input	Description
star_model	Connect from STAR Model Loader.
images	Input video frames (IMAGE batch).
upscale	Upscale factor (1–8, default 4).
steps	Denoising steps (1–100, default 15). Ignored in `fast` mode.
guide_scale	Classifier-free guidance scale (1–20, default 7.5).
prompt	Text prompt. Leave empty for STAR's built-in quality prompt.
solver_mode	`fast` (optimized 15-step schedule) or `normal` (uniform schedule).
max_chunk_len	Max frames per chunk (4–128, default 32). Lower = less VRAM for long videos.
seed	Random seed for reproducibility.
color_fix	`adain` (match color stats), `wavelet` (preserve low-frequency color), or `none`.
segment_size	Process video in segments of this many frames to reduce RAM usage (0–256, default 0). 0 = process all at once. Recommended: 16–32 for long videos. Segments overlap by 25% with linear crossfade blending.

VRAM Requirements

Offload Mode	Approximate VRAM	Notes
disabled	~39 GB	Fastest — everything on GPU
model	~16 GB	Components swap to CPU between stages
aggressive	~12 GB	Model offload + frame-by-frame VAE decode

Reducing max_chunk_len further lowers VRAM usage for long videos at the cost of slightly more processing time.

Model Weights

Models are stored in ComfyUI/models/star/ and auto-downloaded on first use:

Model	Use Case	Source
`light_deg.pt`	Low-res video from the web, mild compression	HuggingFace
`heavy_deg.pt`	Heavily compressed/degraded video	HuggingFace

The OpenCLIP text encoder and SVD temporal VAE are downloaded automatically by their respective libraries on first load.

Standalone CLI

For long videos where ComfyUI's RAM usage becomes a bottleneck, use the standalone script directly. It streams output frames to ffmpeg so peak memory stays bounded regardless of video length.

# Activate your ComfyUI Python environment, then:
python inference.py input.mp4 -o output.mp4

# With model offloading for lower VRAM
python inference.py input.mp4 -o output.mp4 --offload model --segment-size 8

# Image sequence input/output
python inference.py frames_in/ -o frames_out/

# Image sequence to video
python inference.py frames_in/ -o output.mp4 --fps 24

# Single image
python inference.py photo.png -o photo_4x.png

Audio is automatically copied from the input video. Use --no-audio to disable.

Run python inference.py --help for all options.

Credits

STAR by Rui Xie, Yinhong Liu et al. (Nanjing University) — ICCV 2025
Based on I2VGen-XL and VEnhancer

License

This wrapper is MIT licensed. The STAR model weights follow their respective licenses (MIT for I2VGen-XL-based models).

5.2 KiB Raw Blame History Unescape Escape