论文标题
可拖动的上下文匪徒超出了可靠性
Tractable contextual bandits beyond realizability
论文作者
论文摘要
可处理的上下文匪徒算法通常依赖于可实现的假设 - 即,真实的预期奖励模型属于已知类别,例如线性函数。在这项工作中,我们提出了一种可拖动的匪徒算法,该算法对在每个时期内都不对解决约束回归问题的可靠性假设和计算降低不敏感。如果不实现可靠性,我们的算法可确保在可靠性下基于可靠性的算法所获得的遗憾保证,直到一个添加术语来解释错误指定错误。这个额外的术语与t时Times成比例的函数在班级和真实模型中的最佳模型之间的均方误差的函数,其中t是时间步长的总数。我们的工作阐明了可拖动的上下文土匪的偏见变化权衡。这种权衡并未被假定可实现的算法所捕获,因为在此假设下,班级中存在估计器,该估算值达到了零偏见。
Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.