论文标题
通过最大熵编码的自我监督学习
Self-Supervised Learning via Maximum Entropy Coding
论文作者
论文摘要
当前的当前自我监督学习方法的主流类型追求可以很好地转移到下游任务的通用表示,通常通过优化给定的借口任务(例如实例歧视)。在这项工作中,我们认为现有的借口任务不可避免地会将偏见引入学习的表示形式,这反过来又导致了各种下游任务的偏见转移绩效。为了应对此问题,我们提出了最大的熵编码(MEC),这是一个更有原则的目标,可以明确优化表示表示的结构,以便学习的表示形式较小,因此可以更好地概括地看不见下游任务。受信息理论中最大熵原则的启发,我们假设可推广的表示应该是承认所有合理表示中最大熵的代表。为了使客观端到端训练,我们建议利用有损耗的数据编码中的最小编码长度作为熵的计算可牵引替代物,并进一步得出对目标的可扩展重新印度,以便快速计算。广泛的实验表明,MEC比基于特定借口任务的先前方法学习了更具普遍的表示。它在各种下游任务上始终如一地实现最新性能,不仅包括Imagenet线性探针,还包括半监督分类,对象检测,实例分割和对象跟踪。有趣的是,我们表明,可以看出现有的批处理和特征自我监督的目标相当于MEC的低阶近似值。代码和预训练模型可在https://github.com/xinliu20/mec上找到。
A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation. Extensive experiments demonstrate that MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking. Interestingly, we show that existing batch-wise and feature-wise self-supervised objectives could be seen equivalent to low-order approximations of MEC. Code and pre-trained models are available at https://github.com/xinliu20/MEC.