在掺假的存在下，可用于基于模型的学习的强大变量选择

论文标题

在掺假的存在下，可用于基于模型的学习的强大变量选择

Robust variable selection for model-based learning in presence of adulteration

论文作者

Cappozzo, Andrea, Greselin, Francesca, Murphy, Thomas Brendan

论文摘要

在执行监督学习时确定最歧视功能的问题已得到广泛研究。特别是，已经提出了几种用于基于模型的分类的可变选择方法。令人惊讶的是，异常值和错误标记的单位对确定相关预测变量的影响越来越少，文献中几乎没有专门的方法。在本文中，我们介绍了两种可靠的变量选择方法：一种将强大的分类器嵌入贪婪的选择过程中，另一个将最大似然估计和无关的理论嵌入。前者将特征识别作为模型选择问题进行了重塑，而后者则将相关子集视为要估计的模型参数。与非固定溶液相比，提出的方法的好处是通过合成数据的实验评估的。污染光谱数据的高维分类问题的应用得出结论。

The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed. Surprisingly, the impact of outliers and wrongly labeled units on the determination of relevant predictors has received far less attention, with almost no dedicated methodologies available in the literature. In the present paper, we introduce two robust variable selection approaches: one that embeds a robust classifier within a greedy-forward selection procedure and the other based on the theory of maximum likelihood estimation and irrelevance. The former recasts the feature identification as a model selection problem, while the latter regards the relevant subset as a model parameter to be estimated. The benefits of the proposed methods, in contrast with non-robust solutions, are assessed via an experiment on synthetic data. An application to a high-dimensional classification problem of contaminated spectroscopic data concludes the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题