遗憾最小化的实验设计方法

论文标题

遗憾最小化的实验设计方法

An Experimental Design Approach for Regret Minimization in Logistic Bandits

论文作者

Mason, Blake, Jun, Kwang-Sung, Jain, Lalit

论文摘要

在这项工作中，我们考虑了遗憾的最小化物流匪徒的问题。逻辑匪徒的主要挑战是减少对潜在的大问题的依赖性，取决于常数$κ$，最糟糕的比例可能会以未知参数$θ_ {\ ast} $的规范为指数。 Abeille等。（2021）已应用对逻辑函数的自我联系来消除这种最坏的依赖性提供遗憾的保证，例如$ o（d \ log^2（κ）\ sqrt {\dotμt} \ log（| \ nogcal {| \ nathcal {x} |））最好的臂。这项工作通过采用实验设计程序来改善固定臂设置，从而实现$ o（\ sqrt {d \dotμt\ log（| \ Mathcal {x}} |）}}）$的最小值遗憾。实际上，我们的遗憾束缚了一个更严格的实例（即差距）依赖的遗憾，这是在物流匪徒中首次束缚的。我们还提出了一种新的热身抽样算法，该算法可以大大减少较低的阶段，总的来说，它可以在某些情况下将其替代对$κ$的较低阶段依赖性替代$κ$^2（κ）$。最后，我们讨论了MLE的偏见对物流匪徒问题的影响，提供了一个示例，其中$ d^2 $降低订单后悔（参见，线性土匪的$ d $）只要使用MLE以及如何使用MLE，就不得改善偏见的估计器，以使其接近$ d $。

In this work we consider the problem of regret minimization for logistic bandits. The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant $κ$ that can at worst scale exponentially with the norm of the unknown parameter $θ_{\ast}$. Abeille et al. (2021) have applied self-concordance of the logistic function to remove this worst-case dependence providing regret guarantees like $O(d\log^2(κ)\sqrt{\dotμT}\log(|\mathcal{X}|))$ where $d$ is the dimensionality, $T$ is the time horizon, and $\dotμ$ is the variance of the best-arm. This work improves upon this bound in the fixed arm setting by employing an experimental design procedure that achieves a minimax regret of $O(\sqrt{d \dotμT\log(|\mathcal{X}|)})$. Our regret bound in fact takes a tighter instance (i.e., gap) dependent regret bound for the first time in logistic bandits. We also propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general and prove that it can replace the lower order term dependency on $κ$ to $\log^2(κ)$ for some instances. Finally, we discuss the impact of the bias of the MLE on the logistic bandit problem, providing an example where $d^2$ lower order regret (cf., it is $d$ for linear bandits) may not be improved as long as the MLE is used and how bias-corrected estimators may be used to make it closer to $d$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题