论文标题
在视听环境中学习:评论,分析和新观点
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
论文作者
论文摘要
视觉和听力是两种在人类交流和场景理解中起着至关重要的作用的感觉。为了模仿人类的感知能力,旨在开发从音频和视觉方式学习的计算方法的视听学习一直是一个蓬勃发展的领域。一项可以系统地组织和分析视听领域的研究的综合调查。从分析视听认知基础开始,我们介绍了几个关键发现,这些发现激发了我们的计算研究。然后,我们系统地回顾了最近的视听学习研究,并将其分为三类:视听,跨模式感知和视听合作。通过我们的分析,我们发现,跨语义,空间和时间支持上述研究的视听数据的一致性。为了重新审视视听学习领域的当前发展,我们进一步提出了有关视听场景理解的新观点,然后讨论和分析视听学习领域的可行未来方向。总体而言,这项调查从不同方面审查并展示了当前视听学习领域。我们希望它可以为研究人员提供对这一领域的更好理解。发布了包括不断更新的调查在内的网站:\ url {https://gewu-lab.github.io/audio-visual-learning/}。
Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To revisit the current development of the audio-visual learning field from a more macro view, we further propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area. Overall, this survey reviews and outlooks the current audio-visual learning field from different aspects. We hope it can provide researchers with a better understanding of this area. A website including constantly-updated survey is released: \url{https://gewu-lab.github.io/audio-visual-learning/}.