论文标题
黑框模型反转属性推理攻击分类模型
Black-box Model Inversion Attribute Inference Attacks on Classification Models
论文作者
论文摘要
在对隐私敏感领域(例如医学诊断,生活方式预测和业务决策)中使用ML技术的使用越来越多,这突显了需要更好地了解这些ML技术是否引入敏感和专有培训数据的泄漏。在本文中,我们专注于一种模型反转攻击,在该攻击中,对手知道训练数据中实例的非敏感属性,并旨在推断对手未知的敏感属性的价值,并使用Oracle访问目标分类模型。我们设计了两个新颖的模型反转属性推理攻击 - 基于置信模型的攻击和基于置信度得分的攻击,并将我们的攻击扩展到对手未知的其他一些(非敏感)属性的情况。此外,尽管以前的工作将精度用作评估属性推理攻击的有效性的指标,但我们发现,当敏感属性分布不平衡时,准确性并不能力。我们确定了两个指标,这些指标更适合评估属性推理攻击,即G-Mean和Matthews相关系数(MCC)。我们评估了对两种机器学习模型,决策树和深神经网络的攻击,并通过两个真实的数据集进行了训练。实验结果表明,我们新提出的攻击极大地超过了最先进的攻击。此外,我们从经验上表明,培训数据集中的特定组(由属性分组,例如性别,种族)可能更容易受到模型反演攻击的影响。我们还证明,当对手也未知(非敏感)属性(不敏感的)属性时,我们的攻击性能不会受到重大影响。
Increasing use of ML technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakages of sensitive and proprietary training data. In this paper, we focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using oracle access to the target classification model. We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack, and also extend our attack to the case where some of the other (non-sensitive) attributes are unknown to the adversary. Furthermore, while previous work uses accuracy as the metric to evaluate the effectiveness of attribute inference attacks, we find that accuracy is not informative when the sensitive attribute distribution is unbalanced. We identify two metrics that are better for evaluating attribute inference attacks, namely G-mean and Matthews correlation coefficient (MCC). We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets. Experimental results show that our newly proposed attacks significantly outperform the state-of-the-art attacks. Moreover, we empirically show that specific groups in the training dataset (grouped by attributes, e.g., gender, race) could be more vulnerable to model inversion attacks. We also demonstrate that our attacks' performances are not impacted significantly when some of the other (non-sensitive) attributes are also unknown to the adversary.