论文标题
C3:跨境指导的对比聚类
C3: Cross-instance guided Contrastive Clustering
论文作者
论文摘要
聚类是将相似的数据样本收集到群集中的任务,而无需使用任何预定义的标签。它已经在机器学习文献中进行了广泛的研究,并且深度学习的最新进展恢复了对这一领域的兴趣。对比性聚类(CC)模型是深层聚类的主食,其中每个数据实例的正和负对通过数据增强生成。 CC模型旨在学习一个特征空间,其中实例级别和群集级表示正面对的代表分组在一起。尽管改善了SOTA,但这些算法却忽略了跨境模式,这些模式具有改善聚类性能的重要信息。这增加了模型的假阴对率,同时降低了其真实阳性对率。在本文中,我们提出了一种新颖的对比聚类方法,即跨境引导的对比聚类(C3),该方法考虑了跨样本关系,以增加正对的数量,并减轻假阴性,噪声和异常样本对学习数据表示的影响。特别是,我们定义了一个新的损失函数,该函数使用实例级表示来标识相似的实例,并鼓励他们聚集在一起。此外,我们提出了一种新型的加权方法,以更有效的方式选择阴性样品。广泛的实验评估表明,我们提出的方法可以在基准计算机视觉数据集上胜过最先进的算法:我们将聚类准确性提高了6.6%,3.3%,5.0%,1.3%,1.3%和0.3%的CIFAR-10,CIFAR-10,CIFAR-100,CIFAR-100,Imagenet-10,ImagEnet-10,Imagenet-Dogogs,Imagenet-Dogs,Imagenet-Dogs和Tiny-imagenet。
Clustering is the task of gathering similar data samples into clusters without using any predefined labels. It has been widely studied in machine learning literature, and recent advancements in deep learning have revived interest in this field. Contrastive clustering (CC) models are a staple of deep clustering in which positive and negative pairs of each data instance are generated through data augmentation. CC models aim to learn a feature space where instance-level and cluster-level representations of positive pairs are grouped together. Despite improving the SOTA, these algorithms ignore the cross-instance patterns, which carry essential information for improving clustering performance. This increases the false-negative-pair rate of the model while decreasing its true-positive-pair rate. In this paper, we propose a novel contrastive clustering method, Cross-instance guided Contrastive Clustering (C3), that considers the cross-sample relationships to increase the number of positive pairs and mitigate the impact of false negative, noise, and anomaly sample on the learned representation of data. In particular, we define a new loss function that identifies similar instances using the instance-level representation and encourages them to aggregate together. Moreover, we propose a novel weighting method to select negative samples in a more efficient way. Extensive experimental evaluations show that our proposed method can outperform state-of-the-art algorithms on benchmark computer vision datasets: we improve the clustering accuracy by 6.6%, 3.3%, 5.0%, 1.3% and 0.3% on CIFAR-10, CIFAR-100, ImageNet-10, ImageNet-Dogs, and Tiny-ImageNet.