一只箭，两次杀人：一个统一的框架，以实现最佳的遗憾。

论文标题

一只箭，两次杀人：一个统一的框架，以实现最佳的遗憾。

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

论文作者

Gaillard, Pierre, Saha, Aadirupa, Dan, Soham

论文摘要

我们在完全对抗性的设置中解决了\ emph {``内部遗憾'}的问题，并在多型匪徒（MAB）文献中引起了不同现有的睡眠概念之间的联系，并因此分析了这些含义：我们的第一个贡献是提出对\ emph \ emph nestegn segne n nestegh segne的含义。然后，我们提出了一种算法，即使是完全对抗的损失和可用性序列，它也会在这种措施中产生均匀的遗憾。我们进一步表明，睡眠较低的内部遗憾总是意味着外部遗憾低下，以及对IID损失序列的政策遗憾。这项工作的主要贡献精确在于统一了在沉睡的土匪中统一现有遗憾的不同概念，并了解一个人对彼此的影响。最后，我们还将结果扩展到\ emph {Dueling Bandits}（DB）的设置 - MAB的偏好反馈变体，并提出了减少MAB Idea的减少，以设计一种低遗憾的算法，用于与具有随机偏好和对抗性的可用性的睡眠决斗。通过经验评估，我们的算法的功效是合理的。

We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for sleeping MAB. We then proposed an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities. We further show that a low sleeping internal regret always implies a low external regret, and as well as a low policy regret for iid sequence of losses. The main contribution of this work precisely lies in unifying different notions of existing regret in sleeping bandits and understand the implication of one to another. Finally, we also extend our results to the setting of \emph{Dueling Bandits} (DB)--a preference feedback variant of MAB, and proposed a reduction to MAB idea to design a low regret algorithm for sleeping dueling bandits with stochastic preferences and adversarial availabilities. The efficacy of our algorithms is justified through empirical evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题