论文标题

桥接成本敏感的和尼曼 - 佩尔森的范式,用于非对称二进制分类

Bridging Cost-sensitive and Neyman-Pearson Paradigms for Asymmetric Binary Classification

论文作者

Li, Wei Vivian, Tong, Xin, Li, Jingyi Jessica

论文摘要

在现实世界应用中,I型和II误差的严重程度不相等的非对称二进制分类问题在现实应用中无处不在。为了处理这样的不对称性,研究人员开发了成本敏感和尼曼 - 佩逊范式,用于训练分类器以控制更严重的分类错误,例如I型错误。成本敏感的范例被广泛使用,并且具有直接的实现,不需要样品分裂;但是,它需要对I型和II级错误的成本进行明确规范,一个开放的问题是,哪些规范可以保证对人口I类型错误的高概率控制。相比之下,Neyman-Pearson范式可以训练分类器以实现对I型误差的高概率控制,但它依赖于样品分裂,从而减少了有效的训练样本量。由于两个范式具有互补的优势,因此合理的结合分类器结构的优势是合理的。在这项工作中,我们首次研究了两个范式之间的方法学连接,并开发了管子CS算法,从控制I型误差的角度桥接了两个范式。

Asymmetric binary classification problems, in which the type I and II errors have unequal severity, are ubiquitous in real-world applications. To handle such asymmetry, researchers have developed the cost-sensitive and Neyman-Pearson paradigms for training classifiers to control the more severe type of classification error, say the type I error. The cost-sensitive paradigm is widely used and has straightforward implementations that do not require sample splitting; however, it demands an explicit specification of the costs of the type I and II errors, and an open question is what specification can guarantee a high-probability control on the population type I error. In contrast, the Neyman-Pearson paradigm can train classifiers to achieve a high-probability control of the population type I error, but it relies on sample splitting that reduces the effective training sample size. Since the two paradigms have complementary strengths, it is reasonable to combine their strengths for classifier construction. In this work, we for the first time study the methodological connections between the two paradigms, and we develop the TUBE-CS algorithm to bridge the two paradigms from the perspective of controlling the population type I error.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源