论文标题
使用生成模型了解分类器错误
Understanding Classifier Mistakes with Generative Models
论文作者
论文摘要
尽管深度神经网络在监督的学习任务上有效,但已被证明是脆弱的。他们很容易过度拟合训练分配,并且很容易被小小的对抗性扰动所欺骗。在本文中,我们利用生成模型来识别和表征分类器无法概括的实例。我们提出了一个由分类器提取的特征的生成模型,并使用严格的假设检验表明,当我们的模型分配了低概率时,倾向于发生错误。从这个观察结果,我们为样本制定了检测标准,该样品在测试时可能会在该样品上失败。特别是,我们测试了三种不同的分类故障来源:由于模型概括,对抗样本和分布外样品而在测试集上犯的错误。我们的方法对培训集的班级标签不可知,这使其适用于以半监督方式训练的模型。
Although deep neural networks are effective on supervised learning tasks, they have been shown to be brittle. They are prone to overfitting on their training distribution and are easily fooled by small adversarial perturbations. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. We propose a generative model of the features extracted by a classifier, and show using rigorous hypothesis testing that errors tend to occur when features are assigned low-probability by our model. From this observation, we develop a detection criteria for samples on which a classifier is likely to fail at test time. In particular, we test against three different sources of classification failures: mistakes made on the test set due to poor model generalization, adversarial samples and out-of-distribution samples. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.