弱标记视频的细分级对齐的动作持续时间预测

论文标题

弱标记视频的细分级对齐的动作持续时间预测

Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled Videos

论文作者

Ghoddoosian, Reza, Sayed, Saif, Athitsos, Vassilis

论文摘要

本文重点介绍了弱监督的动作对准，其中只有有序的视频级别动作序列可供培训。我们提出了一个新颖的持续时间网络，该网络捕获了视频的短暂时间窗口，并学会了根据该动作的类型，在任何时间点具有粒度水平的给定动作的剩余持续时间。此外，我们引入了一个段级梁搜索以获得最佳的对齐，从而最大程度地提高了我们的后概率。细分级光束搜索通过仅考虑一组具有更自信的预测的框架来有效地对齐动作。实验结果表明，我们对长视频的一致性比现有模型更强大。此外，在流行的早餐和好莱坞扩展数据集的某些情况下，提出的方法实现了最新技术。

This paper focuses on weakly-supervised action alignment, where only the ordered sequence of video-level actions is available for training. We propose a novel Duration Network, which captures a short temporal window of the video and learns to predict the remaining duration of a given action at any point in time with a level of granularity based on the type of that action. Further, we introduce a Segment-Level Beam Search to obtain the best alignment, that maximizes our posterior probability. Segment-Level Beam Search efficiently aligns actions by considering only a selected set of frames that have more confident predictions. The experimental results show that our alignments for long videos are more robust than existing models. Moreover, the proposed method achieves state of the art results in certain cases on the popular Breakfast and Hollywood Extended datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题