论文标题
预测价值泛化范围
Predictive Value Generalization Bounds
论文作者
论文摘要
在本文中,我们研究了一个双标准框架,用于评估二进制分类中的评分功能。正和负预测值(分别为PPV和NPV)是与分类器预测标签相匹配的真标签的条件概率。通常的分类错误率是这些概率的线性组合,因此,错误率的浓度不平等不会产生两个单独的预测值的置信区间。我们通过得出新的无分配大偏差和均匀收敛范围来研究评分函数相对于预测值的概括。后者的界限是根据我们称为顺序系数的函数类复杂性的度量。我们将此组合数量与VC-Subgraph尺寸联系起来。
In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification. The positive and negative predictive values (ppv and npv, respectively) are conditional probabilities of the true label matching a classifier's predicted label. The usual classification error rate is a linear combination of these probabilities, and therefore, concentration inequalities for the error rate do not yield confidence intervals for the two separate predictive values. We study generalization properties of scoring functions with respect to predictive values by deriving new distribution-free large deviation and uniform convergence bounds. The latter bound is stated in terms of a measure of function class complexity that we call the order coefficient; we relate this combinatorial quantity to the VC-subgraph dimension.