The original STAR code runs vae_encode() before the amp.autocast() block.
Our code had it inside, which changes how the encoder processes tensors
and can produce different latent representations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Maps directly to total_noise_levels (denoise * 1000). Default 0.9 matches
the original STAR inference script.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The original STAR inference uses total_noise_levels=900, preserving input
structure during SDEdit. We had 1000 which starts from near-pure noise,
destroying the input. Also always append the quality prompt to user text
instead of using it only as a fallback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Process videos in overlapping segments (25% overlap with linear crossfade
blending) so peak memory is bounded by one segment rather than the full
video. New segment_size parameter on the Super-Resolution node (default 0
= all at once, recommended 16-32 for long videos). Also update README
clone URL to GitHub mirror.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-node package wrapping the STAR (ICCV 2025) diffusion-based video
upscaling pipeline:
- STAR Model Loader: loads UNet+ControlNet, OpenCLIP text encoder, and
temporal VAE with auto-download from HuggingFace
- STAR Video Super-Resolution: runs the full diffusion pipeline with
configurable upscale factor, guidance, solver mode, chunking, and
color correction
Includes three VRAM offload modes (disabled/model/aggressive) to
support GPUs from 12GB to 40GB+.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>