论文标题
跟踪对象作为像素分布
Tracking Objects as Pixel-wise Distributions
论文作者
论文摘要
多对象跟踪(MOT)需要通过帧检测和关联对象。与通过检测到的边界框或跟踪对象作为点跟踪不同,我们建议跟踪对象作为像素分布。我们将此想法实例化,以基于变压器的体系结构P3Aformer,并具有像素的传播,预测和关联。 P3Aformer通过流程信息传播Pixel的功能,以传递帧之间的消息。此外,P3Aformer采用元结构结构来生成多尺度对象特征图。在推断过程中,提出了一个像素关联过程,以基于像素的预测来通过帧恢复对象连接。 P3Aformer在MOT17基准上的MOTA中产生81.2 \%,这是所有变压器网络中第一个达到文献中80 \%MOTA。 P3Aformer在MOT20和Kitti基准测试上也优于最先进的。
Multi-object tracking (MOT) requires detecting and associating objects through frames. Unlike tracking via detected bounding boxes or tracking objects as points, we propose tracking objects as pixel-wise distributions. We instantiate this idea on a transformer-based architecture, P3AFormer, with pixel-wise propagation, prediction, and association. P3AFormer propagates pixel-wise features guided by flow information to pass messages between frames. Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object feature maps. During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark -- the first among all transformer networks to reach 80\% MOTA in literature. P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.