论文标题
在无敏感的对抗性多人多军匪徒上,带有碰撞通信
On No-Sensing Adversarial Multi-player Multi-armed Bandits with Collision Communications
论文作者
论文摘要
我们从新的角度研究了臭名昭著的无敏感性多武器多武器匪(MP-MAB)问题。我们没有专注于多个玩家的硬度,而是引入了一个新的硬度,称为攻击性。所有对手都可以根据攻击性进行分类,我们介绍了对手自适应碰撞交流(A2C2),这是一个算法家族,并在玩家之间进行强制碰撞通信。都研究了攻击性 - 意识和不知道的设置,并利用Z通道模型的信息理论工具以及错误校正编码来解决内隐沟通的挑战,而无需在对抗环境中碰撞信息。对于更具挑战性的攻击性问题,我们提出了一种简单的方法,以估计通过新颖的错误检测重复代码和进行同步的随机通信启用的攻击性。理论分析证明,在不知道攻击性的情况下,都可以实现渐近攻击性依赖性的sublinear遗憾。尤其是,渐近遗憾并没有指数级的依赖,这揭示了在这个问题中硬度的两个维度之间的基本权衡。
We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. Instead of focusing on the hardness of multiple players, we introduce a new dimension of hardness, called attackability. All adversaries can be categorized based on the attackability and we introduce Adversary-Adaptive Collision-Communication (A2C2), a family of algorithms with forced-collision communication among players. Both attackability-aware and unaware settings are studied, and information-theoretic tools of the Z-channel model and error-correction coding are utilized to address the challenge of implicit communication without collision information in an adversarial environment. For the more challenging attackability-unaware problem, we propose a simple method to estimate the attackability enabled by a novel error-detection repetition code and randomized communication for synchronization. Theoretical analysis proves that asymptotic attackability-dependent sublinear regret can be achieved, with or without knowing the attackability. In particular, the asymptotic regret does not have an exponential dependence on the number of players, revealing a fundamental tradeoff between the two dimensions of hardness in this problem.