论文标题
超越阶级条件假设:对抗实例依赖标签噪声的主要尝试
Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise
论文作者
论文摘要
在标签噪声下进行的监督学习最近已经看到了许多进步,而现有的理论发现和经验结果广泛地构成了阶级条件噪声(CCN)假设,即噪声与给定真实标签的输入特征无关。在这项工作中,我们提出了一个理论上的假设检验,并证明现实世界中数据集中的噪声不太可能是CCN,它证实标签噪声应取决于实例,并证明迫切需要超越CCN假设。理论结果激励我们研究了更一般和实用的实例依赖性依赖性噪声(IDN)。为了刺激IDN理论和方法的发展,我们对算法进行了形式化以生成可控的IDN并提供理论和经验证据,以表明IDN具有语义上有意义且具有挑战性。作为对抗IDN的主要尝试,我们提出了一种称为自我进化平均标签(密封)的微小算法,该算法不仅在IDN下以各种噪声分数脱颖而出,而且还提高了对现实世界中噪声基准服装的概括1M。我们的代码已发布。值得注意的是,我们在第2节中的理论分析为研究IDN提供了严格的动机,这是一个重要的主题,在将来值得更多的研究关注。
Supervised learning under label noise has seen numerous advances recently, while existing theoretical findings and empirical results broadly build up on the class-conditional noise (CCN) assumption that the noise is independent of input features given the true label. In this work, we present a theoretical hypothesis testing and prove that noise in real-world dataset is unlikely to be CCN, which confirms that label noise should depend on the instance and justifies the urgent need to go beyond the CCN assumption.The theoretical results motivate us to study the more general and practical-relevant instance-dependent noise (IDN). To stimulate the development of theory and methodology on IDN, we formalize an algorithm to generate controllable IDN and present both theoretical and empirical evidence to show that IDN is semantically meaningful and challenging. As a primary attempt to combat IDN, we present a tiny algorithm termed self-evolution average label (SEAL), which not only stands out under IDN with various noise fractions, but also improves the generalization on real-world noise benchmark Clothing1M. Our code is released. Notably, our theoretical analysis in Section 2 provides rigorous motivations for studying IDN, which is an important topic that deserves more research attention in future.