论文标题

改善了加权随机森林的分类问题

Improved Weighted Random Forest for Classification Problems

论文作者

Shahhosseini, Mohsen, Hu, Guiping

论文摘要

几项研究表明,以适当的方式组合机器学习模型将在基本模型做出的个人预测中提高改进。制作出色表现的合奏模型的关键在于基本模型的多样性。将多样性引入决策树的最常见解决方案是包装和随机森林。袋装通过替换并生成许多训练数据集来增强多样性,而随机森林也增加了随机数量的功能。这使随机森林成为许多机器学习应用的获胜候选人。但是,假设所有基本决策树的权重相等,似乎并不合理,因为采样和输入特征选择的随机化可能会导致基本决策树之间不同级别的决策能力。因此,我们提出了几种打算修改常规随机森林的加权策略的算法,从而做出更好的预测。设计的加权框架包括基于交流屏幕的最佳加权随机森林,基于曲线下区域(AUC)的最佳加权随机森林,基于性能的加权随机森林以及几种基于堆叠的加权随机森林模型。数值结果表明,与常规随机森林相比,所提出的模型能够引入显着改善。

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting a random number of features as well. This has made the random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on ac-curacy, optimal weighted random forest based on the area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源