ActionPotter：视频中时间动作发现的深度强化学习框架

论文标题

ActionPotter：视频中时间动作发现的深度强化学习框架

ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos

论文作者

Vaudaux-Ruth, Guillaume, Chan-Hon-Tong, Adrien, Achard, Catherine

论文摘要

总结视频内容是许多应用程序中的重要任务。该任务可以定义为视频中存在的动作的有序列表的计算。可以使用动作检测算法提取此列表。但是，没有必要确定行动的时间边界以了解其存在。此外，本地化精确的边界通常需要密集的视频分析才能有效。在这项工作中，我们建议通过稀疏浏览视频并选择每个动作实例的一个帧，即文学中的动作斑点来直接计算此订购列表。为此，我们提出了ActionPotter，这是一种斑点算法，它利用深度强化学习来有效地斑点动作，同时调整其视频浏览速度，而无需其他监督。在数据集Thumos14和ActivityNet上执行的实验表明，我们的框架表现优于最先进的检测方法。特别是，在跳过23％的视频时，Thumos14上的平均平均精度从59.7％显着提高到65.6％。

Summarizing video content is an important task in many applications. This task can be defined as the computation of the ordered list of actions present in a video. Such a list could be extracted using action detection algorithms. However, it is not necessary to determine the temporal boundaries of actions to know their existence. Moreover, localizing precise boundaries usually requires dense video analysis to be effective. In this work, we propose to directly compute this ordered list by sparsely browsing the video and selecting one frame per action instance, task known as action spotting in literature. To do this, we propose ActionSpotter, a spotting algorithm that takes advantage of Deep Reinforcement Learning to efficiently spot actions while adapting its video browsing speed, without additional supervision. Experiments performed on datasets THUMOS14 and ActivityNet show that our framework outperforms state of the art detection methods. In particular, the spotting mean Average Precision on THUMOS14 is significantly improved from 59.7% to 65.6% while skipping 23% of video.

下载PDF全文

下载文献需遵守相关版权规定

论文标题