视频动作识别的层次对比度运动学习

论文标题

视频动作识别的层次对比度运动学习

Hierarchical Contrastive Motion Learning for Video Action Recognition

论文作者

Yang, Xitong, Yang, Xiaodong, Liu, Sifei, Sun, Deqing, Davis, Larry, Kautz, Jan

论文摘要

视频动作识别的一个主要问题是如何建模运动。在本文中，我们提出了分层对比运动学习，这是一个新的自我监督学习框架，可从原始视频帧中提取有效的运动表示。我们的方法逐渐学习了运动功能的层次结构，该层次与网络中的不同抽象水平相对应。这种层次设计桥接了低级运动提示和高级识别任务之间的语义差距，并促进了在多个级别上的外观和运动信息的融合。在每个级别上，通过对比度学习提供了明确的运动自学，以在当前级别执行运动功能，以预测上一个级别的未来级别。因此，对较高级别的运动特征进行了训练，可以逐渐捕获语义动力学，并进化为动作识别的歧视性。我们的运动学习模块轻巧且灵活，可以嵌入各种骨干网络中。对四个基准测试的广泛实验表明，所提出的方法始终取得了卓越的结果。

One central question for video action recognition is how to model motion. In this paper, we present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames. Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network. This hierarchical design bridges the semantic gap between low-level motion cues and high-level recognition tasks, and promotes the fusion of appearance and motion information at multiple levels. At each level, an explicit motion self-supervision is provided via contrastive learning to enforce the motion features at the current level to predict the future ones at the previous level. Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition. Our motion learning module is lightweight and flexible to be embedded into various backbone networks. Extensive experiments on four benchmarks show that the proposed approach consistently achieves superior results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题