Abstract: Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results