PROMIX：通过最大化清洁样品实用程序来打击标签噪声

论文标题

PROMIX：通过最大化清洁样品实用程序来打击标签噪声

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

论文作者

Xiao, Ruixuan, Dong, Yiwen, Wang, Haobo, Feng, Lei, Wu, Runze, Chen, Gang, Zhao, Junbo

论文摘要

使用嘈杂的标签学习（LNL）已成为一个吸引人的话题，因为不完美的注释数据相对便宜。最近的最新方法采用特定的选择机制来分开清洁和嘈杂的样本，然后应用半监督学习（SSL）技术以提高性能。但是，选择步骤主要提供了一个中等大小的清洁子集，该子集俯瞰着丰富的干净样品。为了实现这一目标，我们提出了一个新颖的LNL框架Promix，试图最大程度地提高清洁样品的实用性以提高性能。我们方法的关键是，我们提出了一种匹配的高置信选择技术，该技术选择具有较高置信度得分的示例和具有给定标签的匹配预测，以动态扩展基本清洁样品集。为了克服过度清洁设置选择程序的潜在副作用，我们进一步设计了一个新型的SSL框架，该框架能够在分离的清洁和嘈杂的样本上训练平衡和公正的分类器。广泛的实验表明，Promix在具有不同类型和噪声水平的多个基准测试基准上显着提高了当前的最新结果。在CIFAR-N数据集上，它的平均提高为2.48 \％。该代码可从https://github.com/justherozen/promix获得

Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48\% on the CIFAR-N dataset. The code is available at https://github.com/Justherozen/ProMix

下载PDF全文

下载文献需遵守相关版权规定

论文标题