论文标题
混合内核匪徒的双重仪器方法
Dual Instrumental Method for Confounded Kernelized Bandits
论文作者
论文摘要
上下文匪徒问题是一个理论上合理的框架,在各个领域都有广泛的应用程序。虽然先前关于此问题的研究通常需要噪声和上下文之间的独立性,但我们的工作考虑了一个更明智的环境,其中噪声成为影响背景和奖励的潜在混杂因素。这种混杂的设置更现实,可以扩展到更广泛的应用程序。但是,未解决的混杂因素将对奖励功能估计产生偏见,从而导致极大的遗憾。为了应对混杂因素带来的挑战,我们应用了双工具变量回归,该回归可以正确识别真正的奖励功能。我们证明,在两种广泛使用的繁殖核希尔伯特空间中,这种方法的收敛速率几乎是最佳的。因此,我们可以根据混淆匪徒问题的理论保证来设计计算效率和遗憾的算法。数值结果说明了我们提出的算法在混杂的匪徒设置中的功效。
The contextual bandit problem is a theoretically justified framework with wide applications in various fields. While the previous study on this problem usually requires independence between noise and contexts, our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards. Such a confounded setting is more realistic and could expand to a broader range of applications. However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret. To deal with the challenges brought by the confounder, we apply the dual instrumental variable regression, which can correctly identify the true reward function. We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces. Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems. The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.