Standalone inference script that works outside ComfyUI — just activate the same Python venv. Streams output frames to ffmpeg so peak RAM stays bounded regardless of video length. Supports video files, image sequences, and single images. Audio is automatically preserved from input videos. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.2 KiB
ComfyUI-STAR
ComfyUI custom nodes for STAR (Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution) — a diffusion-based video upscaling model (ICCV 2025).
Features
- Diffusion-based 4x video super-resolution with temporal coherence
- Two model variants:
light_deg.pt(light degradation) andheavy_deg.pt(heavy degradation) - Auto-download: all models (UNet checkpoint, OpenCLIP text encoder, temporal VAE) download automatically on first use
- VRAM offloading: three modes to fit GPUs from 12GB to 40GB+
- Long video support: sliding-window chunking with 50% overlap
- Segment-based processing: bound peak RAM for long videos
- Color correction: AdaIN and wavelet-based post-processing
- Standalone CLI: run from the terminal without ComfyUI for long videos
Installation
ComfyUI Manager
Search for ComfyUI-STAR in ComfyUI Manager and install.
Manual
cd ComfyUI/custom_nodes
git clone --recursive https://github.com/ethanfel/Comfyui-STAR.git
cd Comfyui-STAR
pip install -r requirements.txt
The
--recursiveflag clones the STAR submodule. If you forgot it, rungit submodule update --initafterwards.
Example Workflow
A ready-to-use workflow is included in example_workflows/star_basic_workflow.json. Drag and drop it onto the ComfyUI canvas to load it, then select your input image or connect a video loader.
Nodes
STAR Model Loader
Loads the STAR model components (UNet+ControlNet, OpenCLIP text encoder, temporal VAE).
| Input | Description |
|---|---|
| model_name | light_deg.pt for mildly degraded video, heavy_deg.pt for heavily degraded video. Auto-downloaded from HuggingFace on first use. |
| precision | fp16 (recommended), bf16, or fp32. |
| offload | disabled (~39GB VRAM), model (~16GB — swaps components to CPU when idle), aggressive (~12GB — model offload + single-frame VAE decode). |
STAR Video Super-Resolution
Runs the STAR diffusion pipeline on an image batch.
| Input | Description |
|---|---|
| star_model | Connect from STAR Model Loader. |
| images | Input video frames (IMAGE batch). |
| upscale | Upscale factor (1–8, default 4). |
| steps | Denoising steps (1–100, default 15). Ignored in fast mode. |
| guide_scale | Classifier-free guidance scale (1–20, default 7.5). |
| prompt | Text prompt. Leave empty for STAR's built-in quality prompt. |
| solver_mode | fast (optimized 15-step schedule) or normal (uniform schedule). |
| max_chunk_len | Max frames per chunk (4–128, default 32). Lower = less VRAM for long videos. |
| seed | Random seed for reproducibility. |
| color_fix | adain (match color stats), wavelet (preserve low-frequency color), or none. |
| segment_size | Process video in segments of this many frames to reduce RAM usage (0–256, default 0). 0 = process all at once. Recommended: 16–32 for long videos. Segments overlap by 25% with linear crossfade blending. |
VRAM Requirements
| Offload Mode | Approximate VRAM | Notes |
|---|---|---|
| disabled | ~39 GB | Fastest — everything on GPU |
| model | ~16 GB | Components swap to CPU between stages |
| aggressive | ~12 GB | Model offload + frame-by-frame VAE decode |
Reducing max_chunk_len further lowers VRAM usage for long videos at the cost of slightly more processing time.
Model Weights
Models are stored in ComfyUI/models/star/ and auto-downloaded on first use:
| Model | Use Case | Source |
|---|---|---|
light_deg.pt |
Low-res video from the web, mild compression | HuggingFace |
heavy_deg.pt |
Heavily compressed/degraded video | HuggingFace |
The OpenCLIP text encoder and SVD temporal VAE are downloaded automatically by their respective libraries on first load.
Standalone CLI
For long videos where ComfyUI's RAM usage becomes a bottleneck, use the standalone script directly. It streams output frames to ffmpeg so peak memory stays bounded regardless of video length.
# Activate your ComfyUI Python environment, then:
python inference.py input.mp4 -o output.mp4
# With model offloading for lower VRAM
python inference.py input.mp4 -o output.mp4 --offload model --segment-size 8
# Image sequence input/output
python inference.py frames_in/ -o frames_out/
# Image sequence to video
python inference.py frames_in/ -o output.mp4 --fps 24
# Single image
python inference.py photo.png -o photo_4x.png
Audio is automatically copied from the input video. Use --no-audio to disable.
Run python inference.py --help for all options.
Credits
- STAR by Rui Xie, Yinhong Liu et al. (Nanjing University) — ICCV 2025
- Based on I2VGen-XL and VEnhancer
License
This wrapper is MIT licensed. The STAR model weights follow their respective licenses (MIT for I2VGen-XL-based models).