论文标题
标签噪声和异常值的分类框架中的强大变量选择:在农业食品中的光谱数据应用
Robust variable selection in the framework of classification with label noise and outliers: applications to spectroscopic data in agri-food
论文作者
论文摘要
高维光谱数据的分类是分析化学中的常见任务。诸如支持矢量机(SVM)和部分最小二乘判别分析(PLS-DA)等完善的程序是解决此监督学习问题的最常见方法。但是,对这些模型的解释有时仍然很困难,并且基于特征选择的解决方案通常会导致自动识别最有用的波长。不幸的是,对于一些精致的应用,例如食品真实性,错误标记和掺假的光谱都出现在校准和/或验证集中,对模型开发,其预测准确性和鲁棒性产生了巨大影响。本文由这些问题激发,提出了一种基于强大的模型方法,该方法同时执行可变选择,异常值和标签噪声检测。我们证明了我们的建议在处理三项农业食品光谱研究中的有效性,其中考虑了几种形式的扰动。我们的方法成功地降低了问题的复杂性,确定异常光谱并获得了竞争性的预测精度,考虑到非常少数选定的波长。
Classification of high-dimensional spectroscopic data is a common task in analytical chemistry. Well-established procedures like support vector machines (SVMs) and partial least squares discriminant analysis (PLS-DA) are the most common methods for tackling this supervised learning problem. Nonetheless, interpretation of these models remains sometimes difficult, and solutions based on feature selection are often adopted as they lead to the automatic identification of the most informative wavelengths. Unfortunately, for some delicate applications like food authenticity, mislabeled and adulterated spectra occur both in the calibration and/or validation sets, with dramatic effects on the model development, its prediction accuracy and robustness. Motivated by these issues, the present paper proposes a robust model-based method that simultaneously performs variable selection, outliers and label noise detection. We demonstrate the effectiveness of our proposal in dealing with three agri-food spectroscopic studies, where several forms of perturbations are considered. Our approach succeeds in diminishing problem complexity, identifying anomalous spectra and attaining competitive predictive accuracy considering a very low number of selected wavelengths.