联合匪徒：一种闲话的方法

论文标题

联合匪徒：一种闲话的方法

Federated Bandit: A Gossiping Approach

论文作者

Zhu, Zhaowei, Zhu, Jingxuan, Liu, Ji, Liu, Yang

论文摘要

在本文中，我们研究了\ emph {联合匪徒}，这是一个分散的多军强盗问题，其中包含一组$ n $代理，他们只能与由连接的图形$ g $描述的邻居传达其本地数据。每个代理商都会从$ M $候选者中选择ARM进行一系列决定，但他们只能访问本地和潜在的偏见反馈/评估每种动作的真实奖励。只有在本地学习，将导致代理在融合到无重组策略的同时，需要收集分布式数据。受联邦学习的提议的激励，我们旨在寻求一种解决方案，代理商将永远不会与中央实体分享其本地观察结果，并且将被允许仅与邻居共享其自己信息的私人副本。我们首先提出了一种去中心化的强盗算法gossip_ucb，它是经典八卦算法和著名的上置信度（UCB）强盗算法的变体的耦合。我们表明，gossip_ucb成功地适应了当地的匪徒学习到一个全球闲话过程中，以在连接的代理之间共享信息，并以$ o（\ max \ {\ texttt {poly} {poly} {poly}（n，m）的顺序获得保证的遗憾。对于所有$ n $ n $代理，其中$λ_2\ in（0,1）$是预期八卦矩阵的第二大特征值，这是$ g $的函数。然后，我们提出fed_ucb，是gossip_ucb的差异私有版本，其中代理保留其本地数据的$ε$ - 二分化私密性，同时实现$ O（\ max \ {\ frac {\ frac {\ texttt {poly}（poly}（poly}（n，m）}ε\ log^2.5} t，mo）（\ log_ {λ_2^{ - 1}} n + \ log t）\}）$遗憾。

In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed Bandit problem with a set of $N$ agents, who can only communicate their local data with neighbors described by a connected graph $G$. Each agent makes a sequence of decisions on selecting an arm from $M$ candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of $O(\max\{ \texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{λ_2^{-1}} N\})$ for all $N$ agents, where $λ_2\in(0,1)$ is the second largest eigenvalue of the expected gossip matrix, which is a function of $G$. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve $ε$-differential privacy of their local data while achieving $O(\max \{\frac{\texttt{poly}(N,M)}ε\log^{2.5} T, \texttt{poly}(N,M) (\log_{λ_2^{-1}} N + \log T) \})$ regret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题