论文标题
联合匪徒:一种闲话的方法
Federated Bandit: A Gossiping Approach
论文作者
论文摘要
在本文中,我们研究了\ emph {联合匪徒},这是一个分散的多军强盗问题,其中包含一组$ n $代理,他们只能与由连接的图形$ g $描述的邻居传达其本地数据。每个代理商都会从$ M $候选者中选择ARM进行一系列决定,但他们只能访问本地和潜在的偏见反馈/评估每种动作的真实奖励。只有在本地学习,将导致代理在融合到无重组策略的同时,需要收集分布式数据。受联邦学习的提议的激励,我们旨在寻求一种解决方案,代理商将永远不会与中央实体分享其本地观察结果,并且将被允许仅与邻居共享其自己信息的私人副本。我们首先提出了一种去中心化的强盗算法gossip_ucb,它是经典八卦算法和著名的上置信度(UCB)强盗算法的变体的耦合。我们表明,gossip_ucb成功地适应了当地的匪徒学习到一个全球闲话过程中,以在连接的代理之间共享信息,并以$ o(\ max \ {\ texttt {poly} {poly} {poly}(n,m)的顺序获得保证的遗憾。对于所有$ n $ n $代理,其中$λ_2\ in(0,1)$是预期八卦矩阵的第二大特征值,这是$ g $的函数。然后,我们提出fed_ucb,是gossip_ucb的差异私有版本,其中代理保留其本地数据的$ε$ - 二分化私密性,同时实现$ O(\ max \ {\ frac {\ frac {\ texttt {poly}(poly}(poly}(n,m)}ε\ log^2.5} t,mo) (\ log_ {λ_2^{ - 1}} n + \ log t)\})$遗憾。
In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed Bandit problem with a set of $N$ agents, who can only communicate their local data with neighbors described by a connected graph $G$. Each agent makes a sequence of decisions on selecting an arm from $M$ candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of $O(\max\{ \texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{λ_2^{-1}} N\})$ for all $N$ agents, where $λ_2\in(0,1)$ is the second largest eigenvalue of the expected gossip matrix, which is a function of $G$. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve $ε$-differential privacy of their local data while achieving $O(\max \{\frac{\texttt{poly}(N,M)}ε\log^{2.5} T, \texttt{poly}(N,M) (\log_{λ_2^{-1}} N + \log T) \})$ regret.