在多武器匪徒中的条件与边际偏见

论文标题

在多武器匪徒中的条件与边际偏见

On conditional versus marginal bias in multi-armed bandits

论文作者

Shin, Jaehyeok, Ramdas, Aaditya, Rinaldo, Alessandro

论文摘要

在自适应数据分析中，武器的样本平均值的偏差是一个重要的问题，最近在文献中受到了很大的关注。现有结果以精确的方式与各种数据适应性来源的偏差符号和幅度有关，但不适用于仅在满足某些特定条件的情况下才能计算样本均值的条件推理设置。在本文中，我们表征了奖励的单调函数的条件偏置的迹象，包括样本平均值。我们的结果适用于任意调节事件，并利用数据收集策略的自然单调性能。我们进一步证明，通过顺序测试和最佳手臂识别的几个示例，根据调节事件，手臂均值的条件和边缘偏置的符号可能会有所不同。我们的分析提供了有关评估数据自适应设置偏见的微妙之处的新有趣的观点。

The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis that has recently received considerable attention in the literature. Existing results relate in precise ways the sign and magnitude of the bias to various sources of data adaptivity, but do not apply to the conditional inference setting in which the sample means are computed only if some specific conditions are satisfied. In this paper, we characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean. Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy. We further demonstrate, through several examples from sequential testing and best arm identification, that the sign of the conditional and marginal bias of the sample mean of an arm can be different, depending on the conditioning event. Our analysis offers new and interesting perspectives on the subtleties of assessing the bias in data adaptive settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题