论文标题

与政策依赖性优化响应的非政策评估

Off-Policy Evaluation with Policy-Dependent Optimization Response

论文作者

Guo, Wenshuo, Jordan, Michael I., Zhou, Angela

论文摘要

因果推理和机器学习以进行决策的交集正在迅速扩展,但是默认的决策准则仍然是人群中各个因果结果的\ textit {平均}。在实践中,各种操作限制确保决策者的实用程序不是将其实现为\ textIt {peraveriat {peraverian},而是作为下游决策问题的\ textit {output}(例如匹配,分配,网络流,最小化预测性风险)。在这项工作中,我们开发了一个新的框架,以使用\ textit {策略依赖性}线性优化响应:因果结果引入了目标函数系数中的随机性。在此框架下,决策者的实用程序取决于策略依赖性优化,该优化引入了\ textit {optimization}偏见的基本挑战,即使对于策略评估的情况下也是如此。我们通过扰动方法构建了与策略依赖性估计的无偏估计器,并讨论了一组调整后的插件估计器的渐近方差属性。最后,获得无偏见的政策评估允许进行政策优化:我们提供了一种用于优化因果干预措施的一般算法。我们通过数值模拟证实了我们的理论结果。

The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an \textit{average} of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an \textit{average} but rather as an \textit{output} of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with \textit{policy-dependent} linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of \textit{optimization} bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method, and discuss asymptotic variance properties for a set of adjusted plug-in estimators. Lastly, attaining unbiased policy evaluation allows for policy optimization: we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源