积极的未标记对比学习

论文标题

积极的未标记对比学习

Positive Unlabeled Contrastive Learning

论文作者

Acharya, Anish, Sanghavi, Sujay, Jing, Li, Bhushanam, Bhargav, Choudhary, Dhruv, Rabbat, Michael, Dhillon, Inderjit

论文摘要

在未标记的数据上进行自我监督的预处理，然后在标记数据上进行微调进行微调，是从有限标记的示例中学习的流行范式。我们将此范式扩展到经典的正面未标记（PU）设置，在此任务是仅在几个标记的阳性样品中学习二进制分类器，并且（通常）大量未标记的样本（可能是阳性或负数）。我们首先提出了标准的Infonce对比损失家族的简单扩展，并提出了PU设置。与现有的无监督和监督方法相比，这表明这学会了卓越的表示。然后，我们开发一种简单的方法来使用新的PU特异性聚类方案来对未标记的样品进行伪造。然后可以使用这些伪标签来训练最终（正面与负）分类器。我们的方法在几个标准的PU基准数据集上方便地优于最先进的PU方法，而不需要任何类别的a-priori知识（这是其他PU方法中的常见假设）。我们还提供了一个简单的理论分析，可以激发我们的方法。

Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from limited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative). We first propose a simple extension of standard infoNCE family of contrastive losses, to the PU setting; and show that this learns superior representations, as compared to existing unsupervised and supervised approaches. We then develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme; these pseudo-labels can then be used to train the final (positive vs. negative) classifier. Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets, while not requiring a-priori knowledge of any class prior (which is a common assumption in other PU methods). We also provide a simple theoretical analysis that motivates our methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题