与政策依赖性优化响应的非政策评估

论文标题

与政策依赖性优化响应的非政策评估

Off-Policy Evaluation with Policy-Dependent Optimization Response

论文作者

Guo, Wenshuo, Jordan, Michael I., Zhou, Angela

论文摘要

因果推理和机器学习以进行决策的交集正在迅速扩展，但是默认的决策准则仍然是人群中各个因果结果的\ textit {平均}。在实践中，各种操作限制确保决策者的实用程序不是将其实现为\ textIt {peraveriat {peraverian}，而是作为下游决策问题的\ textit {output}（例如匹配，分配，网络流，最小化预测性风险）。在这项工作中，我们开发了一个新的框架，以使用\ textit {策略依赖性}线性优化响应：因果结果引入了目标函数系数中的随机性。在此框架下，决策者的实用程序取决于策略依赖性优化，该优化引入了\ textit {optimization}偏见的基本挑战，即使对于策略评估的情况下也是如此。我们通过扰动方法构建了与策略依赖性估计的无偏估计器，并讨论了一组调整后的插件估计器的渐近方差属性。最后，获得无偏见的政策评估允许进行政策优化：我们提供了一种用于优化因果干预措施的一般算法。我们通过数值模拟证实了我们的理论结果。

The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an \textit{average} of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an \textit{average} but rather as an \textit{output} of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with \textit{policy-dependent} linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of \textit{optimization} bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method, and discuss asymptotic variance properties for a set of adjusted plug-in estimators. Lastly, attaining unbiased policy evaluation allows for policy optimization: we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题