在基于变压器的对象检测器中有效使用多尺度功能

论文标题

在基于变压器的对象检测器中有效使用多尺度功能

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

论文作者

Zhang, Gongjie, Luo, Zhipeng, Tian, Zichen, Zhang, Jingyi, Zhang, Xiaoqin, Lu, Shijian

论文摘要

多尺度功能已被证明在对象检测方面非常有效，但通常具有巨大甚至超出的额外计算成本，尤其是对于最近的基于变压器的检测器而言。在本文中，我们提出了迭代多尺度特征聚合（IMFA） - 一种通用范式，可有效利用基于变压器的对象检测器中的多尺度特征。核心思想是从仅几个关键位置利用稀疏的多尺度特征，并且通过两种新颖的设计实现了稀疏的特征。首先，IMFA重新安排变压器编码器二十字管道，以便可以根据检测预测进行迭代更新编码的功能。其次，在先前检测预测的指导下，IMFA稀疏的量表自适应特征可从几个关键点位置进行精制检测。结果，采样的多尺度特征稀疏，但仍然对对象检测非常有益。广泛的实验表明，所提出的IMFA可以显着提高多个基于变压器的对象检测器的性能，但只有轻微的计算开销。

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题