在不变的贝叶斯安全下学习

论文标题

在不变的贝叶斯安全下学习

Learning under Invariable Bayesian Safety

论文作者

Bahar, Gal, Ben-Porat, Omer, Leyton-Brown, Kevin, Tennenholtz, Moshe

论文摘要

最近的一项工作解决了探索和探索系统中的安全限制。例如，这种限制是由探索由福利应与整体福利保持平衡的个人进行的。在本文中，我们采用了一种模型，该模型受到最近在类似强盗的环境中进行建议的启发的模型。我们通过引入每轮应尊重的安全限制来为这一文献做出贡献，并确定每个回合的期望值高于给定的阈值。由于我们的建模，安全的探索和探索政策值得仔细计划，或者在其他方面，这将导致次优福利。我们为设置设计了渐近的最佳算法，并分析了其实例依赖性收敛速率。

A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题