比较用于播放列表生成的应用方法的方法

论文标题

比较用于播放列表生成的应用方法的方法

A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation

论文作者

Fernández-Loría, Carlos, Provost, Foster, Anderton, Jesse, Carterette, Benjamin, Chandar, Praveen

论文摘要

这项研究对个人治疗分配的方法进行了系统的比较，这是许多应用中引起的一般问题，并受到了经济学家，计算机科学家和社会科学家的极大关注。我们将文献中提出的各种方法分组为三种算法（或金属学习者）：预测结果的学习模型（O-Gearner），学习模型以预测因果关系效应（电子学习者）和学习模型，以预测最佳的治疗分配（A-Alearner）。我们用（1）它们的通用性和（2）他们用来从数据学习模型的目标函数来比较金属师的一般性水平；然后，我们讨论这些特征对建模和决策的影响。值得注意的是，我们在分析和经验上证明了对预测结果或因果效应的优化与优化治疗分配的预测不同，这表明A总体而言，A-Gearner应该与其他Metalearner相比，可以导致更好的治疗分配。我们在选择每个用户的上下文中证明了我们发现的实际含义，这是播放列表生成的最佳算法，以优化参与度。这是对三个不同金属师在实际应用上进行大规模应用的第一个比较（基于超过十亿个个人治疗分配）。除了支持我们的分析结果外，结果还表明，大型A/B测试可以为学习治疗分配策略提供实质性价值，而不是简单地选择平均表现最佳的变体。

This study presents a systematic comparison of methods for individual treatment assignment, a general problem that arises in many applications and has received significant attention from economists, computer scientists, and social scientists. We group the various methods proposed in the literature into three general classes of algorithms (or metalearners): learning models to predict outcomes (the O-learner), learning models to predict causal effects (the E-learner), and learning models to predict optimal treatment assignments (the A-learner). We compare the metalearners in terms of (1) their level of generality and (2) the objective function they use to learn models from data; we then discuss the implications that these characteristics have for modeling and decision making. Notably, we demonstrate analytically and empirically that optimizing for the prediction of outcomes or causal effects is not the same as optimizing for treatment assignments, suggesting that in general the A-learner should lead to better treatment assignments than the other metalearners. We demonstrate the practical implications of our findings in the context of choosing, for each user, the best algorithm for playlist generation in order to optimize engagement. This is the first comparison of the three different metalearners on a real-world application at scale (based on more than half a billion individual treatment assignments). In addition to supporting our analytical findings, the results show how large A/B tests can provide substantial value for learning treatment assignment policies, rather than simply choosing the variant that performs best on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题