简单提示导致强大的多对象跟踪器

论文标题

简单提示导致强大的多对象跟踪器

Simple Cues Lead to a Strong Multi-Object Tracker

论文作者

Seidenschwarz, Jenny, Brasó, Guillem, Serrano, Victor Castro, Elezi, Ismail, Leal-Taixé, Laura

论文摘要

长期以来，多对象跟踪中最常见的范式是逐个检测（TBD），首先检测到对象，然后通过视频帧进行关联。对于关联，大多数模型都资源为运动和外观提示，例如重新识别网络。基于注意力的最新方法建议以数据驱动的方式学习提示，并显示出令人印象深刻的结果。在本文中，我们问自己，简单的良好旧TBD方法是否也能够实现端到端模型的性能。为此，我们提出了两种关键成分，允许标准重新识别网络在基于外观的跟踪方面表现出色。我们广泛地分析了其故障案例，并表明我们的外观特征与简单运动模型的结合导致了强大的跟踪结果。我们的跟踪器概括了四个公共数据集，即MOT17，MOT20，BDD100K和Dancetrack，可实现最先进的性能。 https://github.com/dvl-tum/ghost。

For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance. https://github.com/dvl-tum/GHOST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题