论文标题
时间信息可以帮助进行对比的自我监督学习吗?
Can Temporal Information Help with Contrastive Self-Supervised Learning?
论文作者
论文摘要
利用时间信息已被认为对于开发视频理解模型至关重要。但是,如何将时间信息正确纳入最近成功的实例歧视的基于自我监督的学习(CSL)框架尚不清楚。作为一个直观的解决方案,我们发现直接应用时间增强无济于事,甚至一般而言会损害视频CSL。这种违反直觉的观察促使我们重新设计了现有的视频CSL框架,以更好地整合时间知识。 为此,我们介绍了暂时性的对比自我监督的学习taco,作为增强视频CSL的一般范式。具体而言,炸玉米饼不仅选择了一组时间变换,不仅是强大的数据增强,而且还构成了额外的自我审视,以供视频理解。通过与丰富的时间转变的共同对比,并将这些转变作为自我监督信号,炸玉米饼可以显着增强无监督的视频表示学习。例如,炸玉米饼在骨干和CSL方法列表中表现出一致的下游分类任务的一致改进。我们的最佳模型可实现85.1%(UCF-101)和51.6%(HMDB-51)的TOP-1精度,比以前的最新时间相对相对相对相对改善,为3%和2.4%。
Leveraging temporal information has been regarded as essential for developing video understanding models. However, how to properly incorporate temporal information into the recent successful instance discrimination based contrastive self-supervised learning (CSL) framework remains unclear. As an intuitive solution, we find that directly applying temporal augmentations does not help, or even impair video CSL in general. This counter-intuitive observation motivates us to re-design existing video CSL frameworks, for better integration of temporal knowledge. To this end, we present Temporal-aware Contrastive self-supervised learningTaCo, as a general paradigm to enhance video CSL. Specifically, TaCo selects a set of temporal transformations not only as strong data augmentation but also to constitute extra self-supervision for video understanding. By jointly contrasting instances with enriched temporal transformations and learning these transformations as self-supervised signals, TaCo can significantly enhance unsupervised video representation learning. For instance, TaCo demonstrates consistent improvement in downstream classification tasks over a list of backbones and CSL approaches. Our best model achieves 85.1% (UCF-101) and 51.6% (HMDB-51) top-1 accuracy, which is a 3% and 2.4% relative improvement over the previous state-of-the-art.