Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ComfyUI-STAR
ComfyUI custom nodes for STAR (Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution) — a diffusion-based video upscaling model (ICCV 2025).
Features
- Diffusion-based 4x video super-resolution with temporal coherence
- Two model variants:
light_deg.pt(light degradation) andheavy_deg.pt(heavy degradation) - Auto-download: all models (UNet checkpoint, OpenCLIP text encoder, temporal VAE) download automatically on first use
- VRAM offloading: three modes to fit GPUs from 12GB to 40GB+
- Long video support: sliding-window chunking with 50% overlap
- Segment-based processing: bound peak RAM for long videos
- Color correction: AdaIN and wavelet-based post-processing
- Standalone CLI: run from the terminal without ComfyUI for long videos
Installation
ComfyUI Manager
Search for ComfyUI-STAR in ComfyUI Manager and install.
Manual
cd ComfyUI/custom_nodes
git clone --recursive https://github.com/ethanfel/Comfyui-STAR.git
cd Comfyui-STAR
pip install -r requirements.txt
The
--recursiveflag clones the STAR submodule. If you forgot it, rungit submodule update --initafterwards.
Example Workflow
A ready-to-use workflow is included in example_workflows/star_basic_workflow.json. Drag and drop it onto the ComfyUI canvas to load it, then select your input image or connect a video loader.
Nodes
STAR Model Loader
Loads the STAR model components (UNet+ControlNet, OpenCLIP text encoder, temporal VAE).
| Input | Description |
|---|---|
| model_name | light_deg.pt for mildly degraded video, heavy_deg.pt for heavily degraded video. Auto-downloaded from HuggingFace on first use. |
| precision | fp16 (recommended), bf16, or fp32. |
| offload | disabled (~39GB VRAM), model (~16GB — swaps components to CPU when idle), aggressive (~12GB — model offload + single-frame VAE decode). |
STAR Video Super-Resolution
Runs the STAR diffusion pipeline on an image batch.
| Input | Description |
|---|---|
| star_model | Connect from STAR Model Loader. |
| images | Input video frames (IMAGE batch). |
| upscale | Upscale factor (1–8, default 4). |
| steps | Denoising steps (1–100, default 15). Ignored in fast mode. |
| guide_scale | Classifier-free guidance scale (1–20, default 7.5). |
| prompt | Text prompt. Leave empty for STAR's built-in quality prompt. |
| solver_mode | fast (optimized 15-step schedule) or normal (uniform schedule). |
| max_chunk_len | Max frames per chunk (4–128, default 32). Lower = less VRAM for long videos. |
| seed | Random seed for reproducibility. |
| color_fix | adain (match color stats), wavelet (preserve low-frequency color), or none. |
| segment_size | Process video in segments of this many frames to reduce RAM usage (0–256, default 0). 0 = process all at once. Recommended: 16–32 for long videos. Segments overlap by 25% with linear crossfade blending. |
VRAM Requirements
| Offload Mode | Approximate VRAM | Notes |
|---|---|---|
| disabled | ~39 GB | Fastest — everything on GPU |
| model | ~16 GB | Components swap to CPU between stages |
| aggressive | ~12 GB | Model offload + frame-by-frame VAE decode |
Reducing max_chunk_len further lowers VRAM usage for long videos at the cost of slightly more processing time.
Model Weights
Models are stored in ComfyUI/models/star/ and auto-downloaded on first use:
| Model | Use Case | Source |
|---|---|---|
light_deg.pt |
Low-res video from the web, mild compression | HuggingFace |
heavy_deg.pt |
Heavily compressed/degraded video | HuggingFace |
The OpenCLIP text encoder and SVD temporal VAE are downloaded automatically by their respective libraries on first load.
Standalone CLI
For long videos where ComfyUI's RAM usage becomes a bottleneck, use the standalone script directly. It streams output frames to ffmpeg so peak memory stays bounded regardless of video length.
# Activate your ComfyUI Python environment, then:
python inference.py input.mp4 -o output.mp4
# With model offloading for lower VRAM
python inference.py input.mp4 -o output.mp4 --offload model --segment-size 8
# Image sequence input/output
python inference.py frames_in/ -o frames_out/
# Image sequence to video
python inference.py frames_in/ -o output.mp4 --fps 24
# Single image
python inference.py photo.png -o photo_4x.png
Audio is automatically copied from the input video. Use --no-audio to disable.
Run python inference.py --help for all options.
Credits
- STAR by Rui Xie, Yinhong Liu et al. (Nanjing University) — ICCV 2025
- Based on I2VGen-XL and VEnhancer
License
This wrapper is MIT licensed. The STAR model weights follow their respective licenses (MIT for I2VGen-XL-based models).