对象自适应LSTM网络，用于实时视觉跟踪，并通过对抗数据增强

论文标题

对象自适应LSTM网络，用于实时视觉跟踪，并通过对抗数据增强

Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

论文作者

Du, Yihan, Yan, Yan, Chen, Si, Hua, Yang

论文摘要

近年来，由于卷积神经网络（CNN）的强大特征表示能力，基于深度学习的视觉跟踪方法已取得了巨大的成功。在这些方法中，基于分类的跟踪方法具有出色的性能，而它们的速度受到昂贵的大规模建议特征提取的昂贵计算的限制。相反，基于匹配的跟踪方法（例如暹罗网络）具有显着的速度优势。但是，缺乏在线更新会使这些方法无法适应重大的对象外观变化。在本文中，我们提出了一种新颖的实时视觉跟踪方法，该方法采用对象自适应的LSTM网络有效地捕获视频顺序依赖性并自适应地学习对象外观变化。为了提高计算效率，我们还提出了一种快速的建议选择策略，该策略利用基于匹配的跟踪方法来预估计密集的建议，并选择高质量的建议以供应到LSTM网络进行分类。该策略有效地滤除了一些无关紧要的建议，并避免了特征提取的冗余计算，这使我们的方法能够比基于常规分类的跟踪方法更快地操作。此外，为了解决在线跟踪过程中样本不足和类不平衡问题的问题，我们采用了基于生成对抗网络（GAN）的数据增强技术，以促进LSTM网络的培训。在四个视觉跟踪基准上进行的广泛实验证明了我们方法在跟踪准确性和速度方面的最新性能，这表现出了可视化跟踪的重复结构的巨大潜力。

In recent years, deep learning based visual tracking methods have obtained great success owing to the powerful feature representation ability of Convolutional Neural Networks (CNNs). Among these methods, classification-based tracking methods exhibit excellent performance while their speeds are heavily limited by the expensive computation for massive proposal feature extraction. In contrast, matching-based tracking methods (such as Siamese networks) possess remarkable speed superiority. However, the absence of online updating renders these methods unadaptable to significant object appearance variations. In this paper, we propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations. For high computational efficiency, we also present a fast proposal selection strategy, which utilizes the matching-based tracking method to pre-estimate dense proposals and selects high-quality ones to feed to the LSTM network for classification. This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods. In addition, to handle the problems of sample inadequacy and class imbalance during online tracking, we adopt a data augmentation technique based on the Generative Adversarial Network (GAN) to facilitate the training of the LSTM network. Extensive experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed, which exhibits great potentials of recurrent structures for visual tracking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题