自动梯度树的信息标准

论文标题

自动梯度树的信息标准

An information criterion for automatic gradient tree boosting

论文作者

Lunde, Berent Ånund Strømnes, Kleppe, Tore Selland, Skaug, Hans Julius

论文摘要

提出了一种学习分类和回归树的复杂性以及梯度树中树木数量的信息理论方法。贪婪的叶片分裂程序的乐观（测试损失减去训练损失）被证明是Cox-Ingersoll-Ross过程的最大值，从中形成了基于概括的信息标准。所提出的过程允许快速的本地模型选择，而无需基于交叉验证的超级参数调整，因此在每次促进迭代过程中执行的大量模型之间有效，自动比较。相对于XGBoost，数值实验的加速范围从大约10到1400左右，以相似的预测能力测量，以测试损失衡量。

An information theoretic approach to learning the complexity of classification and regression trees and the number of trees in gradient tree boosting is proposed. The optimism (test loss minus training loss) of the greedy leaf splitting procedure is shown to be the maximum of a Cox-Ingersoll-Ross process, from which a generalization-error based information criterion is formed. The proposed procedure allows fast local model selection without cross validation based hyper parameter tuning, and hence efficient and automatic comparison among the large number of models performed during each boosting iteration. Relative to xgboost, speedups on numerical experiments ranges from around 10 to about 1400, at similar predictive-power measured in terms of test-loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题