无探测器弱监督的小组活动识别

论文标题

无探测器弱监督的小组活动识别

Detector-Free Weakly Supervised Group Activity Recognition

论文作者

Kim, Dongkeun, Lee, Jinsung, Cho, Minsu, Kwak, Suha

论文摘要

小组活动识别是理解一群人在多人视频中整体进行的活动的任务。该任务的现有模型通常是不切实际的，因为它们即使在测试或依靠现成的对象探测器中也需要参与者的地面边界框标签。在此激励的情况下，我们提出了一个新型模型，用于群体活动识别，该模型既不取决于边界框标签也不取决于对象检测器。我们的基于变压器的模型通过利用注意机制来定位和编码组活动的部分上下文，并将视频剪辑作为一组部分上下文嵌入。然后将嵌入向量聚合以形成单个组表示，该表示反映了活动的整个上下文，同时捕获每个部分上下文的时间演变。我们的方法在排球和NBA数据集上实现了出色的性能，不仅超过了经过相同监督培训的最新技术，而且还超过了一些依靠更强大监督的现有模型。

Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video. Existing models for this task are often impractical in that they demand ground-truth bounding box labels of actors even in testing or rely on off-the-shelf object detectors. Motivated by this, we propose a novel model for group activity recognition that depends neither on bounding box labels nor on object detector. Our model based on Transformer localizes and encodes partial contexts of a group activity by leveraging the attention mechanism, and represents a video clip as a set of partial context embeddings. The embedding vectors are then aggregated to form a single group representation that reflects the entire context of an activity while capturing temporal evolution of each partial context. Our method achieves outstanding performance on two benchmarks, Volleyball and NBA datasets, surpassing not only the state of the art trained with the same level of supervision, but also some of existing models relying on stronger supervision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题