Comfyui-STAR

Author	SHA1	Message	Date
Ethanfel	91e5bd8222	Clean up debug logging and fix precision setting for autocast Remove all [STAR DEBUG] print statements added during quality investigation. Fix autocast to actually use the selected precision dtype (fp16/bf16) instead of always defaulting to fp16. fp32 now properly disables autocast for full-precision inference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 03:13:33 +01:00
Ethanfel	45e57f58a0	Print model load status to detect missing/unexpected weight keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:34:26 +01:00
Ethanfel	162c27d243	Add space between user prompt and quality prompt suffix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:32:20 +01:00
Ethanfel	741c02b88c	Add diagnostic debug logging to pipeline stages Prints tensor stats (shape, dtype, min, max, mean, std) at each stage to help diagnose quality issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:25:10 +01:00
Ethanfel	e2025c6ca0	Move VAE encode outside autocast to match original STAR pipeline The original STAR code runs vae_encode() before the amp.autocast() block. Our code had it inside, which changes how the encoder processes tensors and can produce different latent representations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:14:25 +01:00
Ethanfel	0537d9d8a5	Expose denoise parameter (0.1–1.0) in node UI Maps directly to total_noise_levels (denoise * 1000). Default 0.9 matches the original STAR inference script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:06:57 +01:00
Ethanfel	8a440761d1	Fix noise level (900 not 1000) and prompt concatenation to match original STAR The original STAR inference uses total_noise_levels=900, preserving input structure during SDEdit. We had 1000 which starts from near-pure noise, destroying the input. Also always append the quality prompt to user text instead of using it only as a fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 02:03:34 +01:00
Ethanfel	2bf8db4f07	Use fp32 accumulation in SDPA and math attention to match xformers precision Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 01:47:10 +01:00
Ethanfel	0508868978	Revert SDPA to 3D tensors — 4D unsqueeze caused quality degradation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 01:37:10 +01:00
Ethanfel	f03c4853f1	Revert model loading to original HF-based paths Reverts text encoder and VAE loading back to using HuggingFace preset names / repo IDs (downloading to library cache) while keeping the attention dispatcher improvements (4D SDPA, math backend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 01:20:37 +01:00
Ethanfel	4c6c38f05a	Fix attention dispatcher: use 4D tensors for SDPA, add math backend SDPA with 3D xformers-BMK tensors cannot use Flash Attention and falls back to efficient_attention/math kernels that miscompute on Ada Lovelace GPUs (e.g. RTX 6000 Pro), producing brownish line artifacts. Unsqueeze to 4D (1, B*H, N, D) so Flash Attention is eligible. Also add a naive "math" backend (chunked bmm) as a guaranteed-correct diagnostic baseline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 01:05:51 +01:00
Ethanfel	f991f5cb02	Load text encoder and VAE from ComfyUI model folders Download OpenCLIP ViT-H-14 to models/text_encoders/ and SVD temporal VAE to models/vae/svd-temporal-vae/ instead of hidden library caches, so they're visible, reusable, and shared with other nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 00:37:14 +01:00
Ethanfel	e272e1a57d	Fix open_clip batch_first compatibility via auto-applied patch Newer open_clip creates nn.MultiheadAttention with batch_first=True, but STAR's embedder unconditionally permutes to [seq, batch, embed]. This causes a RuntimeError in the text encoder (attn_mask shape mismatch). The patch detects batch_first at runtime and only permutes when needed. Patches in patches/ are auto-applied to the STAR submodule on startup and skip gracefully if already applied. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 00:25:26 +01:00
Ethanfel	82d7f4997a	Add configurable attention backend with SageAttention variant support Replace the auto-detect xformers shim with a runtime dispatcher that always intercepts xformers.ops.memory_efficient_attention. A new dropdown on STARModelLoader (and --attention CLI arg) lets users explicitly select: sdpa (default), xformers, sageattn, or specific SageAttention kernels (fp16 triton/cuda, fp8 cuda). Only backends that successfully import appear as options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 00:12:26 +01:00
Ethanfel	cf74b587ec	Add SageAttention as preferred attention backend when available Attention fallback chain: SageAttention (2-5x faster, INT8 quantized) > xformers > PyTorch native SDPA. SageAttention is optional — install with `pip install sageattention` for a speed boost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 00:00:55 +01:00
Ethanfel	5de26d8ead	Add xformers compatibility shim using PyTorch native SDPA Avoids requiring xformers installation by shimming xformers.ops.memory_efficient_attention with torch.nn.functional.scaled_dot_product_attention when xformers is not available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:58:55 +01:00
Ethanfel	5786ab6be7	Auto-initialize STAR submodule if missing on first load Detects when the STAR submodule directory is empty (cloned without --recursive) and runs git submodule update --init automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:48:43 +01:00
Ethanfel	6cf314baf4	Add standalone CLI for memory-efficient video upscaling Standalone inference script that works outside ComfyUI — just activate the same Python venv. Streams output frames to ffmpeg so peak RAM stays bounded regardless of video length. Supports video files, image sequences, and single images. Audio is automatically preserved from input videos. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:46:58 +01:00
Ethanfel	8794f8ddec	Add example workflow and link in README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:32:35 +01:00
Ethanfel	f7021e95f4	Add segment-based processing for long videos to reduce RAM usage Process videos in overlapping segments (25% overlap with linear crossfade blending) so peak memory is bounded by one segment rather than the full video. New segment_size parameter on the Super-Resolution node (default 0 = all at once, recommended 16-32 for long videos). Also update README clone URL to GitHub mirror. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:28:01 +01:00
Ethanfel	5f9287cfac	Initial release: ComfyUI nodes for STAR video super-resolution Two-node package wrapping the STAR (ICCV 2025) diffusion-based video upscaling pipeline: - STAR Model Loader: loads UNet+ControlNet, OpenCLIP text encoder, and temporal VAE with auto-download from HuggingFace - STAR Video Super-Resolution: runs the full diffusion pipeline with configurable upscale factor, guidance, solver mode, chunking, and color correction Includes three VRAM offload modes (disabled/model/aggressive) to support GPUs from 12GB to 40GB+. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:20:27 +01:00

21 Commits