Files
Comfyui-STAR/README.md
Ethanfel 6cf314baf4 Add standalone CLI for memory-efficient video upscaling
Standalone inference script that works outside ComfyUI — just activate
the same Python venv. Streams output frames to ffmpeg so peak RAM stays
bounded regardless of video length. Supports video files, image
sequences, and single images. Audio is automatically preserved from
input videos.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:46:58 +01:00

121 lines
5.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ComfyUI-STAR
ComfyUI custom nodes for [STAR (Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution)](https://github.com/NJU-PCALab/STAR) — a diffusion-based video upscaling model (ICCV 2025).
## Features
- **Diffusion-based 4x video super-resolution** with temporal coherence
- **Two model variants**: `light_deg.pt` (light degradation) and `heavy_deg.pt` (heavy degradation)
- **Auto-download**: all models (UNet checkpoint, OpenCLIP text encoder, temporal VAE) download automatically on first use
- **VRAM offloading**: three modes to fit GPUs from 12GB to 40GB+
- **Long video support**: sliding-window chunking with 50% overlap
- **Segment-based processing**: bound peak RAM for long videos
- **Color correction**: AdaIN and wavelet-based post-processing
- **Standalone CLI**: run from the terminal without ComfyUI for long videos
## Installation
### ComfyUI Manager
Search for `ComfyUI-STAR` in ComfyUI Manager and install.
### Manual
```bash
cd ComfyUI/custom_nodes
git clone --recursive https://github.com/ethanfel/Comfyui-STAR.git
cd Comfyui-STAR
pip install -r requirements.txt
```
> The `--recursive` flag clones the STAR submodule. If you forgot it, run `git submodule update --init` afterwards.
## Example Workflow
A ready-to-use workflow is included in [`example_workflows/star_basic_workflow.json`](example_workflows/star_basic_workflow.json). Drag and drop it onto the ComfyUI canvas to load it, then select your input image or connect a video loader.
## Nodes
### STAR Model Loader
Loads the STAR model components (UNet+ControlNet, OpenCLIP text encoder, temporal VAE).
| Input | Description |
|-------|-------------|
| **model_name** | `light_deg.pt` for mildly degraded video, `heavy_deg.pt` for heavily degraded video. Auto-downloaded from HuggingFace on first use. |
| **precision** | `fp16` (recommended), `bf16`, or `fp32`. |
| **offload** | `disabled` (~39GB VRAM), `model` (~16GB — swaps components to CPU when idle), `aggressive` (~12GB — model offload + single-frame VAE decode). |
### STAR Video Super-Resolution
Runs the STAR diffusion pipeline on an image batch.
| Input | Description |
|-------|-------------|
| **star_model** | Connect from STAR Model Loader. |
| **images** | Input video frames (IMAGE batch). |
| **upscale** | Upscale factor (18, default 4). |
| **steps** | Denoising steps (1100, default 15). Ignored in `fast` mode. |
| **guide_scale** | Classifier-free guidance scale (120, default 7.5). |
| **prompt** | Text prompt. Leave empty for STAR's built-in quality prompt. |
| **solver_mode** | `fast` (optimized 15-step schedule) or `normal` (uniform schedule). |
| **max_chunk_len** | Max frames per chunk (4128, default 32). Lower = less VRAM for long videos. |
| **seed** | Random seed for reproducibility. |
| **color_fix** | `adain` (match color stats), `wavelet` (preserve low-frequency color), or `none`. |
| **segment_size** | Process video in segments of this many frames to reduce RAM usage (0256, default 0). 0 = process all at once. Recommended: 1632 for long videos. Segments overlap by 25% with linear crossfade blending. |
## VRAM Requirements
| Offload Mode | Approximate VRAM | Notes |
|---|---|---|
| disabled | ~39 GB | Fastest — everything on GPU |
| model | ~16 GB | Components swap to CPU between stages |
| aggressive | ~12 GB | Model offload + frame-by-frame VAE decode |
Reducing `max_chunk_len` further lowers VRAM usage for long videos at the cost of slightly more processing time.
## Model Weights
Models are stored in `ComfyUI/models/star/` and auto-downloaded on first use:
| Model | Use Case | Source |
|-------|----------|--------|
| `light_deg.pt` | Low-res video from the web, mild compression | [HuggingFace](https://huggingface.co/SherryX/STAR/resolve/main/I2VGen-XL-based/light_deg.pt) |
| `heavy_deg.pt` | Heavily compressed/degraded video | [HuggingFace](https://huggingface.co/SherryX/STAR/resolve/main/I2VGen-XL-based/heavy_deg.pt) |
The OpenCLIP text encoder and SVD temporal VAE are downloaded automatically by their respective libraries on first load.
## Standalone CLI
For long videos where ComfyUI's RAM usage becomes a bottleneck, use the standalone script directly. It streams output frames to ffmpeg so peak memory stays bounded regardless of video length.
```bash
# Activate your ComfyUI Python environment, then:
python inference.py input.mp4 -o output.mp4
# With model offloading for lower VRAM
python inference.py input.mp4 -o output.mp4 --offload model --segment-size 8
# Image sequence input/output
python inference.py frames_in/ -o frames_out/
# Image sequence to video
python inference.py frames_in/ -o output.mp4 --fps 24
# Single image
python inference.py photo.png -o photo_4x.png
```
Audio is automatically copied from the input video. Use `--no-audio` to disable.
Run `python inference.py --help` for all options.
## Credits
- [STAR](https://github.com/NJU-PCALab/STAR) by Rui Xie, Yinhong Liu et al. (Nanjing University) — ICCV 2025
- Based on [I2VGen-XL](https://github.com/ali-vilab/VGen) and [VEnhancer](https://github.com/Vchitect/VEnhancer)
## License
This wrapper is MIT licensed. The STAR model weights follow their respective licenses (MIT for I2VGen-XL-based models).