基于原始的视听显着图

论文标题

基于原始的视听显着图

A proto-object based audiovisual saliency map

论文作者

Ramenahalli, Sudarshan

论文摘要

自然环境及其与之的互动本质上是多感官，我们可以在其中部署视觉，触觉和/或听觉感官来感知，学习和与我们的环境互动。我们在这项研究中的目标是使用多感觉信息，特别是视觉和音频来开发场景分析算法。我们开发了一个基于原始的视听显着图（AVSM），以分析动态自然场景。具有$ 360 \度$ $的视野的专用视听摄像头能够定位声音方向，用于收集时空排列的视听数据。我们证明，在检测和本地化显着对象/事件时，基于原始的视听显着图的性能与人类判断一致。此外，与单性显着性图相比，我们计算为视觉和听觉特征映射的线性组合的基于原始的AVSM，可捕获更高数量的有效显着事件。这种算法在监视，机器人导航，视频压缩和相关应用程序中可能有用。

Natural environment and our interaction with it is essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with $360 \degree$ Field of View, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of proto-object based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题