From 82fb7a00091ea045e48833367385fc10e84f452a Mon Sep 17 00:00:00 2001
From: Ethanfel <ethan.fel@ts-pc.fr>
Date: Tue, 7 Apr 2026 09:12:00 +0200
Subject: [PATCH] docs: note AudioX shows no perceptual quality gain on V2A vs
 SelVA

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/audiox_evaluation.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/audiox_evaluation.md b/docs/audiox_evaluation.md
index 3a5b566..fe29436 100644
--- a/docs/audiox_evaluation.md
+++ b/docs/audiox_evaluation.md
@@ -64,7 +64,9 @@ This is not a quality difference per se; flow matching is simply more efficient.
 
 AudioX benchmarks claim superior results on text-to-audio (AudioCaps) and text-to-music
 (MusicCaps) vs prior models. Video-to-audio comparison against MMAudio specifically is not
-prominently featured in the paper, which suggests SelVA remains competitive there.
+prominently featured in the paper. Perceptual evaluation confirms this: AudioX does not sound
+noticeably better than SelVA on video-to-audio tasks. AudioX's advantage is **breadth**
+(music, inpainting, variable duration), not raw video-to-audio quality.
 
 ---