论文标题
在不变的贝叶斯安全下学习
Learning under Invariable Bayesian Safety
论文作者
论文摘要
最近的一项工作解决了探索和探索系统中的安全限制。例如,这种限制是由探索由福利应与整体福利保持平衡的个人进行的。在本文中,我们采用了一种模型,该模型受到最近在类似强盗的环境中进行建议的启发的模型。我们通过引入每轮应尊重的安全限制来为这一文献做出贡献,并确定每个回合的期望值高于给定的阈值。由于我们的建模,安全的探索和探索政策值得仔细计划,或者在其他方面,这将导致次优福利。我们为设置设计了渐近的最佳算法,并分析了其实例依赖性收敛速率。
A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.