论文标题

顶级算法的信息指导的选择

Information-Directed Selection for Top-Two Algorithms

论文作者

You, Wei, Qin, Chao, Wang, Zihao, Yang, Shuoguang

论文摘要

我们考虑了多臂匪徒的最佳K臂识别问题,其目的是通过依次分配测量工作来选择具有最高均值奖励的K臂的确切集合。我们表征了使用双变量的最佳分配的必要条件。值得注意的是,这些最佳条件导致顶级算法设计原理(Russo,2020年)扩展,最初提议用于最佳武器识别。此外,我们的最佳条件会导致一个简单有效的选择规则称为信息定向的选择(IDS),该选择根据信息增益的量度选择了顶级候选人之一。作为一种理论保证,我们证明与ID集成在一起,顶级汤普森采样(渐近)是高斯最佳武器识别的最佳选择,解决了纯粹的探索文献中的明显开放问题(Russo,2020年)。作为副产品,我们表明,对于K> 1,即使算法可以访问未知的“最佳”调谐参数,顶级算法也无法实现最佳性。数值实验表明,与没有自适应选择的算法相比,与算法相比,提出的具有ID的顶级算法和相当大的改进的表现出色。

We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary and sufficient conditions for the optimal allocation using dual variables. Remarkably these optimality conditions lead to the extension of top-two algorithm design principle (Russo, 2020), initially proposed for best-arm identification. Furthermore, our optimality conditions induce a simple and effective selection rule dubbed information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain. As a theoretical guarantee, we prove that integrated with IDS, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature (Russo, 2020). As a by-product, we show that for k > 1, top-two algorithms cannot achieve optimality even when the algorithm has access to the unknown "optimal" tuning parameter. Numerical experiments show the superior performance of the proposed top-two algorithms with IDS and considerable improvement compared with algorithms without adaptive selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源