论文标题
自适应最佳世界算法,用于重型武装匪徒
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
论文作者
论文摘要
在本文中,我们将重尾多军匪徒的概念推广到对抗性环境中,并为重尾的多臂匪徒(MAB)开发出强大的最佳世界世界算法,其中损失的损失具有$α$ -th($ 1 <α\ le 2 $),$ 2 $ 2 $)不存在$队矩$ not variances,而variances却不存在。具体而言,当代理商已知重尾参数$α$和$σ$时,我们设计了一种算法\ texttt {htinf},\ texttt {htinf}同时了解了最佳的遗憾。当$α,σ$未知时,\ texttt {htinf}在随机案例中实现了$ \ log t $ t $ style依赖实例的遗憾,而在对抗性案例中,$ o(t)$ no-regret保证。我们进一步开发了一种算法\ texttt {adatinf},实现$ \ Mathcal o(σk^{1- \ nicefrac1α} t^{\ nicefrac {\ nicefrac {1}α})$ Miniimax最佳遗憾,即使在$ $α$ $α$ $α$的情况下,即使在对抗性设置中,也无需在对方设置中。该结果与已知的遗憾下限(Bubeck等人,2013年)相匹配,该结果假定了随机环境,并且$α$和$σ$都是已知的。 To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $α$ and $σ$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $α$-th ($1<α\le 2$) moments bounded by $σ^α$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $α$ and $σ$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $α,σ$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(σK^{1-\nicefrac 1α}T^{\nicefrac{1}α})$ minimax optimal regret even in adversarial settings, without prior knowledge on $α$ and $σ$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $α$ and $σ$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $α$ and $σ$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.