论文标题
两全其美的模型选择
Best of Both Worlds Model Selection
论文作者
论文摘要
我们在嵌套政策类别的存在下研究强盗情景中的模型选择问题,目的是获得同时的对抗和随机性(“两全其美”)高概率的遗憾保证。我们的方法要求每个基础学习者都有一个候选人的后悔约束,可能会或可能不会举行,而我们的元算法按照一定时间表来扮演每个基础学习者,该时间表使基础学习者的候选人后悔的界限保持平衡,直到被发现违反他们的保证为止。我们开发了专门设计的仔细的错误指定测试,以将上述模型选择标准与利用环境的(潜在良性)性质的能力混合在一起。我们在对抗环境中恢复了畜栏算法的模型选择保证,但是在获得高概率后悔界限的其他好处,特别是在嵌套的对抗性线性匪板的情况下。更重要的是,我们的模型选择结果也同时在差距假设下的随机环境中。这些是在(线性)强盗场景中执行模型选择时,可以达到世界两者(随机和对抗性)保证的第一个理论结果。
We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds, specifically in the case of nested adversarial linear bandits. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios.