多层特征的互动融合，用于组成活动识别

论文标题

多层特征的互动融合，用于组成活动识别

Interactive Fusion of Multi-level Features for Compositional Activity Recognition

论文作者

Yan, Rui, Xie, Lingxi, Shu, Xiangbo, Tang, Jinhui

论文摘要

要了解复杂的动作，需要集成多个信息来源，包括外观，位置和语义特征。但是，这些特征很难被融合，因为它们在形态和维度方面通常差异很大。在本文中，我们提出了一个新颖的框架，该框架通过交互式融合来实现这一目标，即，在不同空间跨越特征并使用辅助预测任务进行指导。具体而言，我们分三个步骤实施了框架，即位置对特征提取，语义特征交互和语义到牙文预测。我们评估了两个动作识别数据集的方法，有点事和charades。交互式融合可以超过现成的动作识别算法，达到一致的准确性增长。特别是，在某种事物上，交互式融合的某些事物的组成设置报告说，就TOP-1的准确性而言，增益为2.9％。

To understand a complex action, multiple sources of information, including appearance, positional, and semantic features, need to be integrated. However, these features are difficult to be fused since they often differ significantly in modality and dimensionality. In this paper, we present a novel framework that accomplishes this goal by interactive fusion, namely, projecting features across different spaces and guiding it using an auxiliary prediction task. Specifically, we implement the framework in three steps, namely, positional-to-appearance feature extraction, semantic feature interaction, and semantic-to-positional prediction. We evaluate our approach on two action recognition datasets, Something-Something and Charades. Interactive fusion achieves consistent accuracy gain beyond off-the-shelf action recognition algorithms. In particular, on Something-Else, the compositional setting of Something-Something, interactive fusion reports a remarkable gain of 2.9% in terms of top-1 accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题