论文标题

通过随机森林预测美国的政策结果

Predicting United States policy outcomes with Random Forests

论文作者

McGuire, Shawn, Delahunt, Charles

论文摘要

美国政府立法成果的二十年,以及富人,普通民众和多元化群体的政策偏好,在由吉伦斯(Gilens),Page等人策划和分析的详细数据集中被捕获。 (2014)。他们发现,富人的偏好与政策成果密切相关,而普通人群的偏好则没有,除非与富人的偏好联系。他们的分析应用了经典统计推断的工具,特别是逻辑回归。在本文中,我们使用机器学习的随机森林分类器(RFS)的互补工具分析了Gilens数据集。我们提出了两个主要发现,分别涉及预测和推断:(i)仅通过模型,可以通过大约70%的平衡准确性来预测持有的测试集,该模型仅咨询富人的偏好和少数强大的利益群体以及政策领域标签。这些结果包括回顾,其中对1997年以前的案例进行了培训,预测了“未来”(1997年后)案件。在这个详细但嘈杂的数据集中,基线(机会)的准确性上涨20%表明,一些富裕参与者在美国政策成果中的重要性很高,并且与一系列研究相吻合,表明美国政府具有很大的全面倾向。 (ii)RF模型的特征选择方法确定了兴趣群体(经济参与者)的显着子集。这些可用于进一步研究政府政策制定的动态,还提供了一个示例RF特征选择方法的潜在价值,以推断诸如此类的数据集。

Two decades of U.S. government legislative outcomes, as well as the policy preferences of rich people, the general population, and diverse interest groups, were captured in a detailed dataset curated and analyzed by Gilens, Page et al. (2014). They found that the preferences of the rich correlated strongly with policy outcomes, while the preferences of the general population did not, except via a linkage with rich people's preferences. Their analysis applied the tools of classical statistical inference, in particular logistic regression. In this paper we analyze the Gilens dataset using the complementary tools of Random Forest classifiers (RFs), from Machine Learning. We present two primary findings, concerning respectively prediction and inference: (i) Holdout test sets can be predicted with approximately 70% balanced accuracy by models that consult only the preferences of rich people and a small number of powerful interest groups, as well as policy area labels. These results include retrodiction, where models trained on pre-1997 cases predicted "future" (post-1997) cases. The 20% gain in accuracy over baseline (chance), in this detailed but noisy dataset, indicates the high importance of a few wealthy players in U.S. policy outcomes, and aligns with a body of research indicating that the U.S. government has significant plutocratic tendencies. (ii) The feature selection methods of RF models identify especially salient subsets of interest groups (economic players). These can be used to further investigate the dynamics of governmental policy making, and also offer an example of the potential value of RF feature selection methods for inference on datasets such as this.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源