TDT：教学探测器，无需完全注释的视频而进行跟踪

论文标题

TDT：教学探测器，无需完全注释的视频而进行跟踪

TDT: Teaching Detectors to Track without Fully Annotated Videos

论文作者

Yu, Shuzhi, Wu, Guanhang, Gu, Chunhui, Fathy, Mohammed E.

论文摘要

最近，使用联合模型预测一个正向通行证中的检测和外观嵌入的一阶段跟踪器受到了广泛关注，并在多对象跟踪（MOT）基准测试中获得了最新的结果。但是，它们的成功取决于跟踪数据完全注释的视频的可用性，这是昂贵且难以获得的视频。这可以限制模型的概括。相比之下，执行检测和分别嵌入的两阶段方法较慢，但训练易于训练，因为它们的数据易于注释。我们建议通过数据蒸馏方法结合两个世界中最好的世界。具体来说，我们使用在Re-ID数据集上训练的教师嵌入器来生成用于检测数据集的伪外观嵌入标签。然后，我们使用增强数据集训练一个探测器，该探测器也能够以完全横线的方式回归这些伪嵌入。我们提出的一阶段解决方案与两阶段的质量相匹配，但速度快3倍。即使教师嵌入在培训过程中没有看到任何跟踪数据，但我们提出的跟踪器可以通过一些受欢迎的跟踪器（例如JDE）实现竞争性能，并接受了完全标记的跟踪数据训练。

Recently, one-stage trackers that use a joint model to predict both detections and appearance embeddings in one forward pass received much attention and achieved state-of-the-art results on the Multi-Object Tracking (MOT) benchmarks. However, their success depends on the availability of videos that are fully annotated with tracking data, which is expensive and hard to obtain. This can limit the model generalization. In comparison, the two-stage approach, which performs detection and embedding separately, is slower but easier to train as their data are easier to annotate. We propose to combine the best of the two worlds through a data distillation approach. Specifically, we use a teacher embedder, trained on Re-ID datasets, to generate pseudo appearance embedding labels for the detection datasets. Then, we use the augmented dataset to train a detector that is also capable of regressing these pseudo-embeddings in a fully-convolutional fashion. Our proposed one-stage solution matches the two-stage counterpart in quality but is 3 times faster. Even though the teacher embedder has not seen any tracking data during training, our proposed tracker achieves competitive performance with some popular trackers (e.g. JDE) trained with fully labeled tracking data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题