From 82fb7a00091ea045e48833367385fc10e84f452a Mon Sep 17 00:00:00 2001 From: Ethanfel Date: Tue, 7 Apr 2026 09:12:00 +0200 Subject: [PATCH] docs: note AudioX shows no perceptual quality gain on V2A vs SelVA Co-Authored-By: Claude Sonnet 4.6 --- docs/audiox_evaluation.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/audiox_evaluation.md b/docs/audiox_evaluation.md index 3a5b566..fe29436 100644 --- a/docs/audiox_evaluation.md +++ b/docs/audiox_evaluation.md @@ -64,7 +64,9 @@ This is not a quality difference per se; flow matching is simply more efficient. AudioX benchmarks claim superior results on text-to-audio (AudioCaps) and text-to-music (MusicCaps) vs prior models. Video-to-audio comparison against MMAudio specifically is not -prominently featured in the paper, which suggests SelVA remains competitive there. +prominently featured in the paper. Perceptual evaluation confirms this: AudioX does not sound +noticeably better than SelVA on video-to-audio tasks. AudioX's advantage is **breadth** +(music, inpainting, variable duration), not raw video-to-audio quality. ---