通过公平攻击加剧算法偏见

论文标题

通过公平攻击加剧算法偏见

Exacerbating Algorithmic Bias through Fairness Attacks

论文作者

Mehrabi, Ninareh, Naveed, Muhammad, Morstatter, Fred, Galstyan, Aram

论文摘要

近年来，算法公平引起了极大的关注，并提出了许多定量措施来表征不同机器学习算法的公平性。尽管有这种兴趣，但这些公平措施在有意的对抗攻击方面的鲁棒性尚未得到适当解决。确实，大多数对抗性机器学习都集中在恶意攻击对系统准确性的影响上，而无需考虑系统的公平性。我们提出了新型的数据中毒攻击，其中对手故意针对系统的公平性。具体来说，我们提出了两个针对公平措施的攻击家族。在锚定攻击中，我们通过将中毒点放在特定目标点附近以偏向结果来偏向决策边界。在对公平性的影响力攻击中，我们旨在最大化敏感属性与决策结果之间的协方差，并影响模型的公平性。我们进行了广泛的实验，以表明我们提出的攻击的有效性。

Algorithmic fairness has attracted significant attention in recent years, with many quantitative measures suggested for characterizing the fairness of different machine learning algorithms. Despite this interest, the robustness of those fairness measures with respect to an intentional adversarial attack has not been properly addressed. Indeed, most adversarial machine learning has focused on the impact of malicious attacks on the accuracy of the system, without any regard to the system's fairness. We propose new types of data poisoning attacks where an adversary intentionally targets the fairness of a system. Specifically, we propose two families of attacks that target fairness measures. In the anchoring attack, we skew the decision boundary by placing poisoned points near specific target points to bias the outcome. In the influence attack on fairness, we aim to maximize the covariance between the sensitive attributes and the decision outcome and affect the fairness of the model. We conduct extensive experiments that indicate the effectiveness of our proposed attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题