跟踪对象作为像素分布

论文标题

跟踪对象作为像素分布

Tracking Objects as Pixel-wise Distributions

论文作者

Zhao, Zelin, Wu, Ze, Zhuang, Yueqing, Li, Boxun, Jia, Jiaya

论文摘要

多对象跟踪（MOT）需要通过帧检测和关联对象。与通过检测到的边界框或跟踪对象作为点跟踪不同，我们建议跟踪对象作为像素分布。我们将此想法实例化，以基于变压器的体系结构P3Aformer，并具有像素的传播，预测和关联。 P3Aformer通过流程信息传播Pixel的功能，以传递帧之间的消息。此外，P3Aformer采用元结构结构来生成多尺度对象特征图。在推断过程中，提出了一个像素关联过程，以基于像素的预测来通过帧恢复对象连接。 P3Aformer在MOT17基准上的MOTA中产生81.2 \％，这是所有变压器网络中第一个达到文献中80 \％MOTA。 P3Aformer在MOT20和Kitti基准测试上也优于最先进的。

Multi-object tracking (MOT) requires detecting and associating objects through frames. Unlike tracking via detected bounding boxes or tracking objects as points, we propose tracking objects as pixel-wise distributions. We instantiate this idea on a transformer-based architecture, P3AFormer, with pixel-wise propagation, prediction, and association. P3AFormer propagates pixel-wise features guided by flow information to pass messages between frames. Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object feature maps. During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark -- the first among all transformer networks to reach 80\% MOTA in literature. P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题