论文标题

结构板:结构化分类变量的有效梯度提升

StructureBoost: Efficient Gradient Boosting for Structured Categorical Variables

论文作者

Lucena, Brian

论文摘要

已经证明,基于结构化的分类决策树(SCDT)的梯度增强方法已被证明在分类变量具有已知的基础结构的问题上超过了数值和一速编码。但是,SCDT中的枚举程序是不可行的,除了具有较低或中等基数的分类变量。我们提出并实施两种方法来克服计算障碍,并有效地对复杂的结构化分类变量进行梯度提升。所得的软件包(称为ConstructBoost)被证明超过了建立的软件包,例如Catboost和LightGBM在包含复杂结构的分类预测指标上的问题。此外,我们证明,由于其对基础结构的了解,结构板可以对看不见的分类值进行准确的预测。

Gradient boosting methods based on Structured Categorical Decision Trees (SCDT) have been demonstrated to outperform numerical and one-hot-encodings on problems where the categorical variable has a known underlying structure. However, the enumeration procedure in the SCDT is infeasible except for categorical variables with low or moderate cardinality. We propose and implement two methods to overcome the computational obstacles and efficiently perform Gradient Boosting on complex structured categorical variables. The resulting package, called StructureBoost, is shown to outperform established packages such as CatBoost and LightGBM on problems with categorical predictors that contain sophisticated structure. Moreover, we demonstrate that StructureBoost can make accurate predictions on unseen categorical values due to its knowledge of the underlying structure.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源