内存增强了视频对象检测的全局本地聚合

论文标题

内存增强了视频对象检测的全局本地聚合

Memory Enhanced Global-Local Aggregation for Video Object Detection

论文作者

Chen, Yihong, Cao, Yue, Hu, Han, Wang, Liwei

论文摘要

人类如何在视频中识别一个物体？由于单帧的质量恶化，人们可能很难通过仅利用一个图像中的信息来识别此框架中的遮挡对象。我们认为，人类有两个重要的提示可以识别视频中的对象：全球语义信息和本地本地化信息。最近，许多方法采用了自我注意力的机制来通过全球语义信息或本地本地化信息来增强关键框架中的特征。在本文中，我们介绍了内存增强的全球本地聚合（MEGA）网络，该网络是最早考虑全球和本地信息的试验之一。此外，通过新颖且精心设计的远程存储器（LRM）模块的能力，我们提出的Mega可以使关键框架能够访问比以前的任何方法更多的内容。通过这两个信息来源增强，我们的方法在ImageNet VID数据集上实现了最先进的性能。代码可在\ url {https://github.com/scalsol/mega.pytorch}上找到。

How do humans recognize an object in a piece of video? Due to the deteriorated quality of single frame, it may be hard for people to identify an occluded object in this frame by just utilizing information within one image. We argue that there are two important cues for humans to recognize objects in videos: the global semantic information and the local localization information. Recently, plenty of methods adopt the self-attention mechanisms to enhance the features in key frame with either global semantic information or local localization information. In this paper we introduce memory enhanced global-local aggregation (MEGA) network, which is among the first trials that takes full consideration of both global and local information. Furthermore, empowered by a novel and carefully-designed Long Range Memory (LRM) module, our proposed MEGA could enable the key frame to get access to much more content than any previous methods. Enhanced by these two sources of information, our method achieves state-of-the-art performance on ImageNet VID dataset. Code is available at \url{https://github.com/Scalsol/mega.pytorch}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题