论文标题
错误指定下的分类:半空间,广义线性模型和连接到可变性
Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability
论文作者
论文摘要
在本文中,我们重新审视了错误指定分类的一些经典问题。尤其是,我们研究了在质量$η$的Massart噪声下学习半空间的问题。在最近的一项工作中,Diakonikolas,Goulekakis和Tzamos通过给出了第一种有效的算法来学习准确性$η+ε$,以解决任何$ε> 0 $,从而解决了一个长期的问题。但是,他们的算法输出了一个复杂的假设,该假设将空间划分为$ \ text {poly}(d,1/ε)$区域。在这里,我们给出了一种简单得多的算法,在此过程中,我们解决了许多出色的开放问题: (1)我们为达到$η+ε$的Massart半空间提供了第一个适当的学习者。我们还可以通过多项式时间算法来提高对样品复杂性的界限。 (2)基于(1),我们开发了一个黑框知识蒸馏过程,将任意复杂的分类器转换为同样好的正确分类器。 (3)通过利用简单但被忽略的与可发展性的连接,我们显示任何SQ算法都需要超级多态的查询才能实现$ \ Mathsf {opt} +ε$。 此外,我们研究$ \ mathbb {e} [y | \ mathbf {x}] =σ(\ langle \ mathbf {w}^*,\ Mathbf {x} \ rangle)$的通用线性模型。该家族包括前面提到的半空间模型作为一种特殊情况,但更丰富,包括其他基本模型,例如逻辑回归。我们介绍了一个具有挑战性的新腐败模型,该模型概括了Massart噪声,并为在这种情况下提供了一般学习算法。我们的算法基于一小部分核心食谱,用于在存在错误的情况下学习进行分类。 最后,我们从经验上研究了Massart噪声下学习半空间的算法,并发现它具有一些吸引人的公平特性。
In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate $η$. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy $η+ ε$ for any $ε> 0$. However, their algorithm outputs a complicated hypothesis, which partitions space into $\text{poly}(d,1/ε)$ regions. Here we give a much simpler algorithm and in the process resolve a number of outstanding open questions: (1) We give the first proper learner for Massart halfspaces that achieves $η+ ε$. We also give improved bounds on the sample complexity achievable by polynomial time algorithms. (2) Based on (1), we develop a blackbox knowledge distillation procedure to convert an arbitrarily complex classifier to an equally good proper classifier. (3) By leveraging a simple but overlooked connection to evolvability, we show any SQ algorithm requires super-polynomially many queries to achieve $\mathsf{OPT} + ε$. Moreover we study generalized linear models where $\mathbb{E}[Y|\mathbf{X}] = σ(\langle \mathbf{w}^*, \mathbf{X}\rangle)$ for any odd, monotone, and Lipschitz function $σ$. This family includes the previously mentioned halfspace models as a special case, but is much richer and includes other fundamental models like logistic regression. We introduce a challenging new corruption model that generalizes Massart noise, and give a general algorithm for learning in this setting. Our algorithms are based on a small set of core recipes for learning to classify in the presence of misspecification. Finally we study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties.