论文标题
从多视图的角度学习自我监督的学习
Self-supervised Learning from a Multi-view Perspective
论文作者
论文摘要
作为无监督的表示学习的子集,自我监督的表示学习采用自定义的信号作为监督,并将学习的表示形式用于下游任务,例如对象检测和图像字幕。自然而然的学习方法自然而然地遵循多视图的观点,其中输入(例如原始图像)和自我监督的信号(例如,增强图像)可以看作是数据的两个冗余视图。本文从这个多视图的角度构建,提供了一个信息理论框架,以更好地了解鼓励成功的自学学习的属性。具体而言,我们证明了自我监督的学习表示形式可以提取与任务相关的信息并丢弃任务 - 求信息信息。我们的理论框架为自我监督学习目标设计的更大空间铺平了道路。特别是,我们提出了一个综合目标,该目标弥合了先前的对比和预测学习目标之间的差距,并引入了一个额外的目标术语来丢弃任务 - 毫无疑问的信息。为了验证我们的分析,我们进行了受控的实验以评估复合目标的影响。我们还探索了框架的经验概括,超出了多视图的观点,在该观点上可能无法清楚地观察到跨视图的冗余。
As a subset of unsupervised representation learning, self-supervised representation learning adopts self-defined signals as supervision and uses the learned representation for downstream tasks, such as object detection and image captioning. Many proposed approaches for self-supervised learning follow naturally a multi-view perspective, where the input (e.g., original images) and the self-supervised signals (e.g., augmented images) can be seen as two redundant views of the data. Building from this multi-view perspective, this paper provides an information-theoretical framework to better understand the properties that encourage successful self-supervised learning. Specifically, we demonstrate that self-supervised learned representations can extract task-relevant information and discard task-irrelevant information. Our theoretical framework paves the way to a larger space of self-supervised learning objective design. In particular, we propose a composite objective that bridges the gap between prior contrastive and predictive learning objectives, and introduce an additional objective term to discard task-irrelevant information. To verify our analysis, we conduct controlled experiments to evaluate the impact of the composite objectives. We also explore our framework's empirical generalization beyond the multi-view perspective, where the cross-view redundancy may not be clearly observed.