自适应最佳世界算法，用于重型武装匪徒

论文标题

自适应最佳世界算法，用于重型武装匪徒

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

论文作者

Huang, Jiatai, Dai, Yan, Huang, Longbo

论文摘要

在本文中，我们将重尾多军匪徒的概念推广到对抗性环境中，并为重尾的多臂匪徒（MAB）开发出强大的最佳世界世界算法，其中损失的损失具有$α$ -th（$ 1 <α\ le 2 $），$ 2 $ 2 $）不存在$队矩$ not variances，而variances却不存在。具体而言，当代理商已知重尾参数$α$和$σ$时，我们设计了一种算法\ texttt {htinf}，\ texttt {htinf}同时了解了最佳的遗憾。当$α，σ$未知时，\ texttt {htinf}在随机案例中实现了$ \ log t $ t $ style依赖实例的遗憾，而在对抗性案例中，$ o（t）$ no-regret保证。我们进一步开发了一种算法\ texttt {adatinf}，实现$ \ Mathcal o（σk^{1- \ nicefrac1α} t^{\ nicefrac {\ nicefrac {1}α}）$ Miniimax最佳遗憾，即使在$ $α$ $α$ $α$的情况下，即使在对抗性设置中，也无需在对方设置中。该结果与已知的遗憾下限（Bubeck等人，2013年）相匹配，该结果假定了随机环境，并且$α$和$σ$都是已知的。 To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $α$ and $σ$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $α$-th ($1<α\le 2$) moments bounded by $σ^α$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $α$ and $σ$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $α,σ$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(σK^{1-\nicefrac 1α}T^{\nicefrac{1}α})$ minimax optimal regret even in adversarial settings, without prior knowledge on $α$ and $σ$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $α$ and $σ$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $α$ and $σ$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题