GLT-T：在点云中为3D单一对象跟踪的全局本地变压器投票

论文标题

GLT-T：在点云中为3D单一对象跟踪的全局本地变压器投票

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

论文作者

Nie, Jiahao, He, Zhiwei, Yang, Yuxiang, Gao, Mingyu, Zhang, Jing

论文摘要

当前的3D单一对象跟踪方法通常基于3D区域提案网络的votenet。尽管取得了成功，但使用单个种子点功能作为偏移学习的提示，可以防止产生高质量的3D建议。此外，在投票过程中，具有不同重要性的种子点会受到同样的处理，从而加剧了这一缺陷。为了解决这些问题，我们提出了一种新型的全球本地变压器投票计划，以提供更有信息的线索，并指导该模型更多地关注潜在的种子点，从而促进高质量的3D提案的产生。从技术上讲，全球本地变压器（GLT）模块被用来将对象和斑块感知到种子点特征，以有效地形成强大的特征表示种子点的几何位置，从而为偏移学习提供了更强大而准确的提示。随后，一种简单而有效的培训策略旨在训练GLT模块。我们开发了一个重要的预测分支，以了解种子点的潜在重要性，并将输出权重视为训练约束项。通过将上述组件合并在一起，我们展示了卓越的跟踪方法GLT-T。关于挑战Kitti和Nuscenes基准测试的广泛实验表明，GLT-T在3D单一对象跟踪任务中实现了最先进的性能。此外，进一步的消融研究表明，拟议的全球 - 本地变压计划计划比原始投票的优势。代码和型号将在https://github.com/haooozi/glt-t上找到。

Current 3D single object tracking methods are typically based on VoteNet, a 3D region proposal network. Despite the success, using a single seed point feature as the cue for offset learning in VoteNet prevents high-quality 3D proposals from being generated. Moreover, seed points with different importance are treated equally in the voting process, aggravating this defect. To address these issues, we propose a novel global-local transformer voting scheme to provide more informative cues and guide the model pay more attention on potential seed points, promoting the generation of high-quality 3D proposals. Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning. Subsequently, a simple yet effective training strategy is designed to train the GLT module. We develop an importance prediction branch to learn the potential importance of the seed points and treat the output weights vector as a training constraint term. By incorporating the above components together, we exhibit a superior tracking method GLT-T. Extensive experiments on challenging KITTI and NuScenes benchmarks demonstrate that GLT-T achieves state-of-the-art performance in the 3D single object tracking task. Besides, further ablation studies show the advantages of the proposed global-local transformer voting scheme over the original VoteNet. Code and models will be available at https://github.com/haooozi/GLT-T.

下载PDF全文

下载文献需遵守相关版权规定

论文标题