通过自我监督和多任务学习在视频中检测异常

论文标题

通过自我监督和多任务学习在视频中检测异常

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

论文作者

Georgescu, Mariana-Iuliana, Barbalau, Antonio, Ionescu, Radu Tudor, Khan, Fahad Shahbaz, Popescu, Marius, Shah, Mubarak

论文摘要

视频中的异常检测是一个具有挑战性的计算机视觉问题。由于训练时间缺乏异常事件，因此在没有完全监督的情况下，需要设计学习方法。在本文中，我们通过对象级别的自我监督和多任务学习在视频中进行异常事件检测。我们首先使用预训练的检测器来检测对象。然后，我们通过共同学习多个代理任务来训练3D卷积神经网络，以产生歧视性异常信息：三个自我监督和一个基于知识蒸馏。自我监督的任务是：（i）歧视向前/向后移动对象（时间箭头），（ii）在连续/间歇性帧（运动不规则性）和（iii）重建对象特定外观信息中歧视对象。知识蒸馏任务考虑了分类和检测信息，在发生异常时会在教师和学生模型之间产生较大的预测差异。据我们所知，我们是第一个将视频中异常事件检测作为多任务学习问题的人，将多个自我监督和知识蒸馏代理任务整合在单个体系结构中。我们的轻量级体系结构的表现优于三个基准的最新方法：大道，上海和UCSD PED2。此外，我们进行了一项消融研究，证明了在多任务学习环境中整合自我监督学习和正常蒸馏的重要性。

Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题