论文标题
通过高阶拆分标准为决策树诱导的通用保证
Universal guarantees for decision tree induction via a higher-order splitting criterion
论文作者
论文摘要
我们提出了自上而下的决策树学习启发式方法(例如ID3,C4.5和购物车)的简单扩展。我们的算法可实现所有目标功能的可证明的保证$ f:\ { - 1,1 \}^n \ to \ { - { - 1,1 \} $,就统一分布而言,不可能的结果表明,即使对于简单目标功能,现有的启发式功能也很差。我们扩展的症结是一个新的分裂标准,它考虑了$ f $和其属性的小子集之间的相关性。相比之下,现有的启发式方法的分裂标准(例如Gini杂质和信息增益)仅基于$ f $及其个体属性之间的相关性。 我们的算法满足以下保证:对于所有目标功能$ f:\ { - 1,1 \}^n \ to \ { - 1,1 \} $,尺寸$ s \ in \ mathbb {n} $,以及错误参数$ε$,它构造了一个size size $ s^\ tilde的决策树, s)^2/ε^2)} $ far ofry $ \ le o(\ mathsf {opt} _s) +ε$,其中$ \ mathsf {opt} _s $表示最佳大小$ s $ s $ necect y deciessey fection y。推动我们分析的关键技术概念是$ f $的噪声稳定性,这是一种经过良好研究的平滑度措施。
We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitting criterion that takes into account the correlations between $f$ and small subsets of its attributes. The splitting criteria of existing heuristics (e.g. Gini impurity and information gain), in contrast, are based solely on the correlations between $f$ and its individual attributes. Our algorithm satisfies the following guarantee: for all target functions $f : \{-1,1\}^n \to \{-1,1\}$, sizes $s\in \mathbb{N}$, and error parameters $ε$, it constructs a decision tree of size $s^{\tilde{O}((\log s)^2/ε^2)}$ that achieves error $\le O(\mathsf{opt}_s) + ε$, where $\mathsf{opt}_s$ denotes the error of the optimal size $s$ decision tree. A key technical notion that drives our analysis is the noise stability of $f$, a well-studied smoothness measure.