互补边界生成器具有时间动作定位的规模不变关系建模：提交活动网挑战2020

论文标题

互补边界生成器具有时间动作定位的规模不变关系建模：提交活动网挑战2020

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020

论文作者

Su, Haisheng, Feng, Jinyuan, Shao, Hao, Jiang, Zhenyu, Zhang, Manyuan, Wu, Wei, Liu, Yu, Li, Hongsheng, Yan, Junjie

论文摘要

该技术报告概述了我们在提交活动网络挑战2020任务1（\ textbf {时间动作本地化/检测}的解决方案）。时间动作本地化不仅需要精确定位动作实例的时间界限，而且还需要将未修剪的视频准确地分类为特定类别。在本文中，我们将时间行动本地化任务分为两个阶段（即提案生成和分类），并通过详尽地探索从不同但互补的角度探索多个组件的影响来丰富提案的多样性。具体来说，为了产生高质量的建议，我们考虑了几个因素，包括视频功能编码器，提案生成器，提案宣传关系，规模不平衡和集成策略。最后，为了获得准确的检测，我们需要进一步训练最佳视频分类器以识别生成的建议。我们提出的方案通过\ textbf {42.26}在挑战测试集上的平均地图上实现了时间动作本地化任务的最新性能。

This technical report presents an overview of our solution used in the submission to ActivityNet Challenge 2020 Task 1 (\textbf{temporal action localization/detection}). Temporal action localization requires to not only precisely locate the temporal boundaries of action instances, but also accurately classify the untrimmed videos into specific categories. In this paper, we decouple the temporal action localization task into two stages (i.e. proposal generation and classification) and enrich the proposal diversity through exhaustively exploring the influences of multiple components from different but complementary perspectives. Specifically, in order to generate high-quality proposals, we consider several factors including the video feature encoder, the proposal generator, the proposal-proposal relations, the scale imbalance, and ensemble strategy. Finally, in order to obtain accurate detections, we need to further train an optimal video classifier to recognize the generated proposals. Our proposed scheme achieves the state-of-the-art performance on the temporal action localization task with \textbf{42.26} average mAP on the challenge testing set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题