论文标题
在通信约束下的分布式线性匪徒
Distributed Linear Bandits under Communication Constraints
论文作者
论文摘要
我们考虑分布式的线性土匪,其中$ m $代理商会协作学习,以最大程度地减少所有代理商产生的整体累积遗憾。信息交换是由中央服务器促进的,上行链路和下行链路通信都以固定容量的通道进行,这限制了每次使用通道中可以传输的信息量。我们通过(i)在所需的通信(在碎屑方面)建立信息理论的下限来调查遗憾交流的权衡; (ii)开发一种有效的算法,该算法使用信息理论下限规定的最低通信顺序实现了集中学习提供的最低sublingear遗憾令。对于稀疏的线性匪徒,我们显示了拟议算法的一种变体,通过利用问题的稀疏性,可以更好地遗憾交流。
We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink communications are carried over channels with fixed capacity, which limits the amount of information that can be transmitted in each use of the channels. We investigate the regret-communication trade-off by (i) establishing information-theoretic lower bounds on the required communications (in terms of bits) for achieving a sublinear regret order; (ii) developing an efficient algorithm that achieves the minimum sublinear regret order offered by centralized learning using the minimum order of communications dictated by the information-theoretic lower bounds. For sparse linear bandits, we show a variant of the proposed algorithm offers better regret-communication trade-off by leveraging the sparsity of the problem.