不确定性意识到的弱监督的动作检测来自未修剪的视频

论文标题

不确定性意识到的弱监督的动作检测来自未修剪的视频

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos

论文作者

Arnab, Anurag, Sun, Chen, Nagrani, Arsha, Schmid, Cordelia

论文摘要

尽管视频分类最近取得了进步，但时空行动识别的进展仍落后。一个主要的因素是注释视频逐个框架的巨大成本。在本文中，我们提出了一个时空的动作识别模型，该模型仅接受视频级标签训练，这很容易注释。我们的方法利用了在多个实例学习框架内已在大图像数据集上训练的人均检测器。我们展示了如何在标准的多个实例学习假设（每个袋子都包含一个带有指定标签的实例）的情况下应用我们的方法，使用MIL的新型概率变体无效，我们可以估计每个预测的不确定性。此外，我们在UCF101-24上弱监督的方法中报告了AVA数据集和最新结果的第一个弱监督结果。

Despite the recent advances in video classification, progress in spatio-temporal action recognition has lagged behind. A major contributing factor has been the prohibitive cost of annotating videos frame-by-frame. In this paper, we present a spatio-temporal action recognition model that is trained with only video-level labels, which are significantly easier to annotate. Our method leverages per-frame person detectors which have been trained on large image datasets within a Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid using a novel probabilistic variant of MIL where we estimate the uncertainty of each prediction. Furthermore, we report the first weakly-supervised results on the AVA dataset and state-of-the-art results among weakly-supervised methods on UCF101-24.

下载PDF全文

下载文献需遵守相关版权规定

论文标题