Moquad：以运动为中心的四连杆构造视频对比学习

论文标题

Moquad：以运动为中心的四连杆构造视频对比学习

MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

论文作者

Liu, Yuan, Chen, Jiacheng, Wu, Hao

论文摘要

学习有效的运动功能是对视频表示学习的重要追求。本文提出了一种简单而有效的样本构造策略，以增强视频对比学习中运动功能的学习。所提出的方法称为运动的四构结构（MOQUAD），通过精心削弱正面和负样品的外观和运动来增强实例歧视，以为每个视频实例创建四倍体，从而鼓励模型来利用运动信息。与最近创建用于学习运动功能的额外辅助任务或应用明确的时间建模的方法不同，我们的方法可以使简单明了的对比度学习范式（即SIMCLR）无需多任务学习或额外的建模。此外，我们通过分析初始Moquad实验来设计两种额外的培训策略。通过简单地将Moquad应用于SimClr，广泛的实验表明，与艺术状态相比，我们在下游任务上实现了卓越的性能。值得注意的是，在UCF-101的动作识别任务上，我们仅在200个时代的动力学400上预先培训模型后达到93.7％的精度，超过了以前的各种先前方法

Learning effective motion features is an essential pursuit of video representation learning. This paper presents a simple yet effective sample construction strategy to boost the learning of motion features in video contrastive learning. The proposed method, dubbed Motion-focused Quadruple Construction (MoQuad), augments the instance discrimination by meticulously disturbing the appearance and motion of both the positive and negative samples to create a quadruple for each video instance, such that the model is encouraged to exploit motion information. Unlike recent approaches that create extra auxiliary tasks for learning motion features or apply explicit temporal modelling, our method keeps the simple and clean contrastive learning paradigm (i.e.,SimCLR) without multi-task learning or extra modelling. In addition, we design two extra training strategies by analyzing initial MoQuad experiments. By simply applying MoQuad to SimCLR, extensive experiments show that we achieve superior performance on downstream tasks compared to the state of the arts. Notably, on the UCF-101 action recognition task, we achieve 93.7% accuracy after pre-training the model on Kinetics-400 for only 200 epochs, surpassing various previous methods

下载PDF全文

下载文献需遵守相关版权规定

论文标题