层次的注意力网络用于行动细分

论文标题

层次的注意力网络用于行动细分

Hierarchical Attention Network for Action Segmentation

论文作者

Gammulle, Harshala, Denman, Simon, Sridharan, Sridha, Fookes, Clinton

论文摘要

事件的时间分割是自动识别视频中人类行为的必不可少的任务，也是前体。已经尝试通过注意力来捕获框架级别的显着方面，但是他们缺乏能够有效地绘制框架之间的时间关系的能力，因为它们仅捕获有限的时间依赖性。为此，我们提出了一种完整的端到端监督学习方法，可以更好地学习随着时间的推移行动之间的关系，从而提高整体细分性能。提出的分层复发框架以多个时间尺度分析输入视频，以在框架级别和段级别形成嵌入式，并执行细粒度的动作分割。这会生成一个简单，轻巧但非常有效的体系结构，用于分割连续的视频流并具有多个应用程序域。我们在多个具有挑战性的公共基准数据集上评估了我们的系统，包括MERL购物，50个沙拉和佐治亚理工学院的EgoCentric数据集，并实现最先进的性能。评估的数据集涵盖了许多视频捕获设置，其中包括静态架空摄像头视图和动态，以自我为中心的摄像头视图，展示了所提出的框架在各种设置中的直接适用性。

The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题