防御联邦学习中的贴标签攻击

论文标题

防御联邦学习中的贴标签攻击

Defending against the Label-flipping Attack in Federated Learning

论文作者

Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Sánchez, David, Blanco-Justicia, Alberto

论文摘要

联合学习（FL）通过设计为参与同行提供了自主性和隐私，他们合作地建立了机器学习（ML）模型，同时将其私人数据保存在设备中。但是，同样的自主权通过进行不靶向或有针对性的中毒攻击来毒害模型的毒品为模型毒害。标签式填充（LF）攻击是一种有针对性的中毒攻击，攻击者通过将一些示例的标签从一个类（即源类）转换为另一个类别（即目标类别）来毒害他们的训练数据。不幸的是，这种攻击易于执行，难以检测，并且对全球模型的性能产生负面影响。现有针对LF的防御措施受到对同行数据分布和/或使用高维模型的表现不佳的假设的限制。在本文中，我们深入研究了LF攻击行为，并发现攻击者和诚实的同伴对源类示例的矛盾目标反映在与输出层中源和目标类别相对应的参数梯度中，从而使这些梯度良好的歧视性歧视性特征用于攻击检测。因此，我们提出了一种新颖的防御，该防御首先从同龄人的本地更新中动态提取这些梯度，然后将提取的梯度簇，分析产生的簇，并在模型聚合之前滤除潜在的不良更新。对三个数据集的广泛经验分析表明，无论数据分布或模型维度如何，建议的防御力对LF攻击的有效性。此外，拟议的防御能力通过提供较低的测试错误，更高的总体准确性，更高的源类准确性，较低的攻击成功率以及源类准确性的较高稳定性来优于几个最先进的防御能力。

Federated learning (FL) provides autonomy and privacy by design to participating peers, who cooperatively build a machine learning (ML) model while keeping their private data in their devices. However, that same autonomy opens the door for malicious peers to poison the model by conducting either untargeted or targeted poisoning attacks. The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples from one class (i.e., the source class) to another (i.e., the target class). Unfortunately, this attack is easy to perform and hard to detect and it negatively impacts on the performance of the global model. Existing defenses against LF are limited by assumptions on the distribution of the peers' data and/or do not perform well with high-dimensional models. In this paper, we deeply investigate the LF attack behavior and find that the contradicting objectives of attackers and honest peers on the source class examples are reflected in the parameter gradients corresponding to the neurons of the source and target classes in the output layer, making those gradients good discriminative features for the attack detection. Accordingly, we propose a novel defense that first dynamically extracts those gradients from the peers' local updates, and then clusters the extracted gradients, analyzes the resulting clusters and filters out potential bad updates before model aggregation. Extensive empirical analysis on three data sets shows the proposed defense's effectiveness against the LF attack regardless of the data distribution or model dimensionality. Also, the proposed defense outperforms several state-of-the-art defenses by offering lower test error, higher overall accuracy, higher source class accuracy, lower attack success rate, and higher stability of the source class accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题