结构化线性上下文匪徒：尖锐而几何平滑的分析

论文标题

结构化线性上下文匪徒：尖锐而几何平滑的分析

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

论文作者

Sivakumar, Vidyashankar, Wu, Zhiwei Steven, Banerjee, Arindam

论文摘要

匪徒学习算法通常涉及探索和剥削的平衡。但是，在许多实际应用中，很少遇到需要系统探索的最坏情况。在这项工作中，我们考虑了结构化线性上下文匪徒的平滑设置，在该设置中，对抗性上下文受到高斯噪声的扰动和未知参数$θ^*$具有结构的结构，例如，稀疏性，小组稀疏，低等级等。我们建议对单个和多个准则的求解和多个差异（即，均为单一的范围和多种参数）（即，均为多个参数（即，of-parameter）（（emult-Parame））（即，均为多个参数（即，of-parameter）的范围（即，of-parameter compare）（即，of-paramepers and-carmeter and-compare）unge。 $θ^*$，带有任何假定的结构。遗憾的边界是用几何数量表示的，例如与$θ^*$的结构相关的高斯宽度。由于我们的改进分析，我们还获得了与非结构化的$θ^*$设置相比，我们获得了更清晰的后悔界限。我们显示，在平滑的环境中有隐式探索，其中一种简单的贪婪算法可行。

Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter $θ^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- and multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for $θ^*$ with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of $θ^*$. We also obtain sharper regret bounds compared to earlier work for the unstructured $θ^*$ setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题