迈向可扩展和健壮的结构匪徒：一个元学习框架

论文标题

迈向可扩展和健壮的结构匪徒：一个元学习框架

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

论文作者

Wan, Runzhe, Ge, Lin, Song, Rui

论文摘要

由于维度的诅咒，已知大规模结构匪徒的在线学习是具有挑战性的。在本文中，我们为一类结构化匪徒问题提出了一个统一的元学习框架，可以将参数空间分解为项目级。新颖的匪徒算法通常应用于许多流行问题，可扩展到巨大的参数和动作空间，并适用于概括模型的规范。该框架的核心是贝叶斯分层模型，该模型允许通过其功能之间的信息共享信息，我们在其上设计了元汤普森采样算法。彻底讨论了三个代表性的例子。理论分析和数值结果都支持该方法的有用性。

Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the parameter space can be factorized to item-level. The novel bandit algorithm is general to be applied to many popular problems,scalable to the huge parameter and action spaces, and robust to the specification of the generalization model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Both theoretical analysis and numerical results support the usefulness of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题