论文标题
机器学习中的特征选择:Rényi最小渗透与香农熵
Feature selection in machine learning: Rényi min-entropy vs Shannon entropy
论文作者
论文摘要
在机器学习的背景下,特征选择是将高度预测性特征与可能无关或多余的功能分开的过程。信息理论已被认为是该任务的有用概念,因为预测能力源于功能和标签之间的相关性,即共同信息。文献中许多用于特征选择的算法都采用了基于香农 - 内向的共同信息。在本文中,我们探讨了使用rényi最小透气拷贝的可能性。特别是,我们提出了一种基于有条件的rényi最小值概念的算法,该算法最近在安全和隐私领域中被采用,并且与贝叶斯错误严格相关。我们证明,这两种方法通常是无与伦比的,从某种意义上说,我们可以构建基于Rényi的算法的数据集,其性能比相应的基于Shannon的算法更好,并且情况相反。但是,实际上,在考虑真实数据数据集时,似乎基于Rényi的算法往往比另一个算法胜过另一个算法。我们已经在基霍克,Semeion和Gisette数据集上实现了一些实验,并且在所有这些数据集中,我们的确观察到,基于Rényi的算法可获得更好的结果。
Feature selection, in the context of machine learning, is the process of separating the highly predictive feature from those that might be irrelevant or redundant. Information theory has been recognized as a useful concept for this task, as the prediction power stems from the correlation, i.e., the mutual information, between features and labels. Many algorithms for feature selection in the literature have adopted the Shannon-entropy-based mutual information. In this paper, we explore the possibility of using Rényi min-entropy instead. In particular, we propose an algorithm based on a notion of conditional Rényi min-entropy that has been recently adopted in the field of security and privacy, and which is strictly related to the Bayes error. We prove that in general the two approaches are incomparable, in the sense that we show that we can construct datasets on which the Rényi-based algorithm performs better than the corresponding Shannon-based one, and datasets on which the situation is reversed. In practice, however, when considering datasets of real data, it seems that the Rényi-based algorithm tends to outperform the other one. We have effectuate several experiments on the BASEHOCK, SEMEION, and GISETTE datasets, and in all of them we have indeed observed that the Rényi-based algorithm gives better results.