与因果土匪的自适应利用D-Separator

论文标题

与因果土匪的自适应利用D-Separator

Adaptively Exploiting d-Separators with Causal Bandits

论文作者

Bilodeau, Blair, Wang, Linbo, Roy, Daniel M.

论文摘要

多臂匪徒问题提供了一个框架，以确定一系列重复实验的最佳干预措施。如果没有其他假设，最小值的最佳性能（通过累积遗憾衡量）就可以很好地理解。通过访问其他观察到的变量，可以从结果中分离干预措施（即，它们是D-分离器），最近的“因果匪徒”算法证明会引起遗憾。但是，在实践中，希望观察到的变量是否是d分隔者，这是不可知的。理想情况下，算法应具有自适应；也就是说，在甲骨文的存在或不存在d-separator的情况下，执行几乎和算法一样。在这项工作中，我们对这种适应性的概念进行了形式化和研究，并提供了一种新颖的算法，该算法同时实现（a）观察到D分离器时，可以实现最佳遗憾，从而改善了经典的最小值算法，并且（b）比最近的Causal bastit算法要小得多，而在近期观察的coasusal bandit算法中，当所经过的变量变量不是d-separbles时。至关重要的是，我们的算法不需要任何观察到D-Separator的Oracle知识。我们还将这种适应性推广到其他条件，例如前门标准。

Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent "causal bandit" algorithms provably incur less regret. However, in practice it is desirable to be agnostic to whether observed variables are a d-separator. Ideally, an algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of the presence or absence of a d-separator. In this work, we formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a d-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator. Crucially, our algorithm does not require any oracle knowledge of whether a d-separator is observed. We also generalize this adaptivity to other conditions, such as the front-door criterion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题