极端随机森林

论文标题

极端随机森林

Extremal Random Forests

论文作者

Gnecco, Nicola, Terefe, Edossa Merga, Engelke, Sebastian

论文摘要

在关注的分位数极端并且只有少数或没有培训数据点超过它的情况下，分数回归的经典方法失败了。极值理论的渐近结果可用于推断数据范围之外，并且存在几种使用线性回归，内核方法或广义添加剂模型的方法。如果预测空间具有多个维度，或者极端分位数的回归函数是否复杂，那么这些方法中的大多数会分解。我们提出了一种极端分位回归的方法，该方法将随机森林的灵活性与外推理论结合在一起。我们的极端随机森林（ERF）估计了通过从分位数随机森林中提取的权重来最大化局部可能性，从而估计了在预测载体上的广义帕累托分布的参数。我们在这种可能性中对形状参数进行惩罚，以使其在预测空间中的变异性正常。在吸引人条件的一般领域，我们在未确定性和惩罚案例中显示了估计参数的一致性。仿真研究表明，我们的ERF优于经典的分位回归方法和极值理论的现有回归方法。我们将方法应用于美国工资数据的极端分数预测。

Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题