神经匪徒与手臂组图

论文标题

神经匪徒与手臂组图

Neural Bandit with Arm Group Graph

论文作者

Qi, Yunzhe, Ban, Yikun, He, Jingrui

论文摘要

上下文匪徒旨在根据其上下文信息在一组最佳奖励的武器中识别最佳奖励。由于手臂通常表现出群体行为和群体之间存在相互影响的事实，我们引入了一个新的模型，ARM组图（AGG），节点代表武器组和加权边缘组成组之间的相关性。为了利用丰富的信息，我们提出了一种强盗算法，即ag-ucb，在该算法中，神经网络旨在估计奖励，我们建议利用图形神经网络（GNN）来学习具有相关性的ARM组的表示。为了解决土匪中的剥削 - 探索困境，我们得出了建立在神经网络（剥削）以进行探索的新的上置信度结合（UCB）。此外，我们证明，Agg-UCB可以通过过度参数化的神经网络实现近乎最理想的遗憾，并提供GNN的收敛分析，并具有完全连接的层，这可能具有独立的利益。最后，我们对多个公共数据集对最新基线进行了广泛的实验，显示了拟议算法的有效性。

Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based on their contextual information. Motivated by the fact that the arms usually exhibit group behaviors and the mutual impacts exist among groups, we introduce a new model, Arm Group Graph (AGG), where the nodes represent the groups of arms and the weighted edges formulate the correlations among groups. To leverage the rich information in AGG, we propose a bandit algorithm, AGG-UCB, where the neural networks are designed to estimate rewards, and we propose to utilize graph neural networks (GNN) to learn the representations of arm groups with correlations. To solve the exploitation-exploration dilemma in bandits, we derive a new upper confidence bound (UCB) built on neural networks (exploitation) for exploration. Furthermore, we prove that AGG-UCB can achieve a near-optimal regret bound with over-parameterized neural networks, and provide the convergence analysis of GNN with fully-connected layers which may be of independent interest. In the end, we conduct extensive experiments against state-of-the-art baselines on multiple public data sets, showing the effectiveness of the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题