定位长期结果

论文标题

定位长期结果

Targeting for long-term outcomes

论文作者

Yang, Jeremy, Eckles, Dean, Dhillon, Paramveer, Aral, Sinan

论文摘要

决策者通常希望针对干预措施，以最大程度地提高仅长期观察到的结果。这通常需要延迟决策，直到观察结果或依靠简单的短期代理进行长期结果。在这里，我们以统计代孕和政策学习文学为基础，以归咎于缺失的长期成果，然后通过双重稳定的方法估算归纳结果的最佳目标政策。我们首先表明，与估算结果的平均治疗效应估计有效性的有效性也足以进行有效的政策评估和优化；此外，这些条件可以放宽政策优化。我们通过针对其数字订户的最佳折扣来实现最大程度地提高长期收入的目标折扣，以在波士顿环球报的两个大规模积极主动的流动管理实验中运用我们的方法。使用第一个实验，我们通过比较了使用估算结果与基本真实性，长期结果学到的政策进行比较的政策，从经验上评估了这种方法。这两个政策的表现在统计学上是无法区分的，我们排除了依赖替代物的巨大损失。我们的方法还表现出色，胜过一项关于短期代理的政策，以实现长期结果。在第二个字段实验中，我们通过其他随机探索实施了最佳目标策略，这使我们能够更新未来订户的最佳策略。三年来，我们的方法与现状相比，净阳性收入影响在4-500万美元之间。

Decision makers often want to target interventions so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long-term outcome. Here we build on the statistical surrogacy and policy learning literatures to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We first show that conditions for the validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We apply our approach in two large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers with the aim of maximizing long-term revenue. Using the first experiment, we evaluate this approach empirically by comparing the policy learned using imputed outcomes with a policy learned on the ground-truth, long-term outcomes. The performance of these two policies is statistically indistinguishable, and we rule out large losses from relying on surrogates. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for future subscribers. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to the status quo.

下载PDF全文

下载文献需遵守相关版权规定

论文标题