论文标题
当公平符合隐私时:具有半私人敏感属性的公平分类
When Fairness Meets Privacy: Fair Classification with Semi-Private Sensitive Attributes
论文作者
论文摘要
机器学习模型在许多领域都表现出了有希望的表现。但是,担心他们可能会偏向特定的人群群体,阻碍了他们在高风险应用中的采用。因此,必须确保机器学习模型中的公平性。以前的大多数工作都需要直接访问敏感属性以减轻偏见。尽管如此,考虑用户对数据收集过程中的隐私的担忧,获得大规模用户的敏感属性通常是不可行的。由于法律合规性和人们对隐私的认识,诸如当地差异隐私(LDP)之类的隐私机制(例如当地差异隐私(LDP))在数据收集阶段被广泛执行。因此,一个关键的问题是如何在隐私下做出公平的预测。我们在半私人环境中研究了一个新颖而实用的分类问题,其中大多数敏感属性都是私人的,只有少量的干净。为此,我们提出了一个新颖的框架Fairsp,可以在半私人环境下实现公平的预测。首先,FairSP学会通过利用有限的清洁敏感属性来纠正保护噪声的敏感属性。然后,它以对抗性方式共同建模校正和清洁数据以进行偏见和预测。理论分析表明,所提出的模型可以在半私人环境中的轻度假设下确保公平性。对现实世界数据集的广泛实验结果证明了我们在隐私下做出公平预测并保持高精度的有效性。
Machine learning models have demonstrated promising performance in many areas. However, the concerns that they can be biased against specific demographic groups hinder their adoption in high-stake applications. Thus, it is essential to ensure fairness in machine learning models. Most previous efforts require direct access to sensitive attributes for mitigating bias. Nonetheless, it is often infeasible to obtain large-scale users' sensitive attributes considering users' concerns about privacy in the data collection process. Privacy mechanisms such as local differential privacy (LDP) are widely enforced on sensitive information in the data collection stage due to legal compliance and people's increasing awareness of privacy. Therefore, a critical problem is how to make fair predictions under privacy. We study a novel and practical problem of fair classification in a semi-private setting, where most of the sensitive attributes are private and only a small amount of clean ones are available. To this end, we propose a novel framework FairSP that can achieve Fair prediction under the Semi-Private setting. First, FairSP learns to correct the noise-protected sensitive attributes by exploiting the limited clean sensitive attributes. Then, it jointly models the corrected and clean data in an adversarial way for debiasing and prediction. Theoretical analysis shows that the proposed model can ensure fairness under mild assumptions in the semi-private setting. Extensive experimental results on real-world datasets demonstrate the effectiveness of our method for making fair predictions under privacy and maintaining high accuracy.