BeVformer v2：通过透视监督调整现代图像托管对鸟类视图的识别

论文标题

BeVformer v2：通过透视监督调整现代图像托管对鸟类视图的识别

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

论文作者

Yang, Chenyu, Chen, Yuntao, Tian, Hao, Tao, Chenxin, Zhu, Xizhou, Zhang, Zhaoxiang, Huang, Gao, Li, Hongyang, Qiao, Yu, Lu, Lewei, Zhou, Jie, Dai, Jifeng

论文摘要

我们提供了一个新颖的鸟眼观看（BEV）探测器，并具有透视监督，它可以更快，更适合现代图像骨架。现有的最新BEV探测器通常与Vovnet这样的一定深度训练的骨架绑定，从而阻碍了蓬勃发展的图像骨架和BEV探测器之间的协同作用。为了解决这一限制，我们优先考虑通过引入透视空间监督来简化BEV检测器的优化。为此，我们提出了一个两阶段的BEV探测器，从角度来看，提案被送入鸟类视图的头部以进行最终预测。为了评估我们的模型的有效性，我们进行了广泛的消融研究，重点介绍了拟议检测器的监督形式和一般性。提出的方法通过各种传统和现代图像骨架进行了验证，并在大规模的Nuscenes数据集上实现了新的SOTA结果。该代码应尽快发布。

We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. Existing state-of-the-art BEV detectors are often tied to certain depth pre-trained backbones like VoVNet, hindering the synergy between booming image backbones and BEV detectors. To address this limitation, we prioritize easing the optimization of BEV detectors by introducing perspective space supervision. To this end, we propose a two-stage BEV detector, where proposals from the perspective head are fed into the bird's-eye-view head for final predictions. To evaluate the effectiveness of our model, we conduct extensive ablation studies focusing on the form of supervision and the generality of the proposed detector. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset. The code shall be released soon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题