论文标题
基于原始的视听显着图
A proto-object based audiovisual saliency map
论文作者
论文摘要
自然环境及其与之的互动本质上是多感官,我们可以在其中部署视觉,触觉和/或听觉感官来感知,学习和与我们的环境互动。我们在这项研究中的目标是使用多感觉信息,特别是视觉和音频来开发场景分析算法。我们开发了一个基于原始的视听显着图(AVSM),以分析动态自然场景。具有$ 360 \度$ $的视野的专用视听摄像头能够定位声音方向,用于收集时空排列的视听数据。我们证明,在检测和本地化显着对象/事件时,基于原始的视听显着图的性能与人类判断一致。此外,与单性显着性图相比,我们计算为视觉和听觉特征映射的线性组合的基于原始的AVSM,可捕获更高数量的有效显着事件。这种算法在监视,机器人导航,视频压缩和相关应用程序中可能有用。
Natural environment and our interaction with it is essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with $360 \degree$ Field of View, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of proto-object based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.