论文标题

结构化线性上下文匪徒:尖锐而几何平滑的分析

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

论文作者

Sivakumar, Vidyashankar, Wu, Zhiwei Steven, Banerjee, Arindam

论文摘要

匪徒学习算法通常涉及探索和剥削的平衡。但是,在许多实际应用中,很少遇到需要系统探索的最坏情况。在这项工作中,我们考虑了结构化线性上下文匪徒的平滑设置,在该设置中,对抗性上下文受到高斯噪声的扰动和未知参数$θ^*$具有结构的结构,例如,稀疏性,小组稀疏,低等级等。我们建议对单个和多个准则的求解和多个差异(即,均为单一的范围和多种参数)(即,均为多个参数(即,of-parameter)((emult-Parame))(即,均为多个参数(即,of-parameter)的范围(即,of-parameter compare)(即,of-paramepers and-carmeter and-compare)unge。 $θ^*$,带有任何假定的结构。遗憾的边界是用几何数量表示的,例如与$θ^*$的结构相关的高斯宽度。由于我们的改进分析,我们还获得了与非结构化的$θ^*$设置相比,我们获得了更清晰的后悔界限。我们显示,在平滑的环境中有隐式探索,其中一种简单的贪婪算法可行。

Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $θ^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $θ^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $θ^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $θ^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源