论文标题
减轻联邦学习中的后门攻击
Mitigating Backdoor Attacks in Federated Learning
论文作者
论文摘要
恶意客户可以在培训阶段使用恶意数据(包括后门样本)攻击联邦学习系统。受损的全局模型将在为任务设计的验证数据集上表现良好,但是带有后门模式的一小部分数据可能会触发模型以做出错误的预测。试图掩盖攻击者的攻击者与试图在服务器侧训练的聚合阶段试图检测攻击的人之间进行了军备竞赛。在这项工作中,我们提出了一种新的有效方法,以减轻训练阶段后的后门攻击。具体来说,我们设计了一种联合修剪方法来删除网络中的冗余神经元,然后调整模型的极端重量值。我们在分布式时尚持续时间进行的实验表明,我们的方法可以将平均攻击成功率从99.7%降低到1.9%,而验证数据集的测试准确性损失了5.5%。为了最大程度地减少对测试准确性的修剪影响,我们可以在修剪后进行微调,并且攻击成功率下降到6.4%,而测试准确性损失仅为1.7%。在分布式后门对CIFAR-10下进行的进一步实验还显示出令人鼓舞的结果,即平均攻击成功率下降了70%以上,而验证数据集的测试准确性损失小于2%。
Malicious clients can attack federated learning systems using malicious data, including backdoor samples, during the training phase. The compromised global model will perform well on the validation dataset designed for the task, but a small subset of data with backdoor patterns may trigger the model to make a wrong prediction. There has been an arms race between attackers who tried to conceal attacks and defenders who tried to detect attacks during the aggregation stage of training on the server-side. In this work, we propose a new and effective method to mitigate backdoor attacks after the training phase. Specifically, we design a federated pruning method to remove redundant neurons in the network and then adjust the model's extreme weight values. Our experiments conducted on distributed Fashion-MNIST show that our method can reduce the average attack success rate from 99.7% to 1.9% with a 5.5% loss of test accuracy on the validation dataset. To minimize the pruning influence on test accuracy, we can fine-tune after pruning, and the attack success rate drops to 6.4%, with only a 1.7% loss of test accuracy. Further experiments under Distributed Backdoor Attacks on CIFAR-10 also show promising results that the average attack success rate drops more than 70% with less than 2% loss of test accuracy on the validation dataset.