论文标题
脱机聚类方法用于班级失去平衡图像数据的自我监督学习
Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data
论文作者
论文摘要
众所周知,类不平衡的数据集会导致模型偏向多数类的问题。在这个项目中,我们设定了两个研究问题:1)阶级不平衡问题何时在自我监管的预训练中更普遍? 2)特征表示的脱机聚类可以帮助预先培训类失去平衡的数据吗?我们的实验通过调整{\ it class-falbalance}的程度来研究以前的问题,当训练基线模型(即CIFAR-10数据库上的Simclr和Simsiam)时。为了回答后一个问题,我们在功能簇的每个子集上训练每个专家模型。然后,我们将专家模型的知识提炼成单个模型,以便我们能够将该模型的性能与基准进行比较。
Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of {\it class-imbalance} when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.