论文标题
关于通过记忆的镜头增强数据的隐私效果
On the Privacy Effect of Data Enhancement via the Lens of Memorization
论文作者
论文摘要
机器学习构成了严重的隐私问题,因为已经表明,学识渊博的模型可以揭示有关其培训数据的敏感信息。许多作品都研究了广泛采用的数据增强和对抗培训技术的影响,这些培训技术被称为论文中的数据增强,对机器学习模型的隐私泄漏的影响。这种隐私效应通常是通过成员推理攻击(MIA)来衡量的,旨在确定特定例子是否属于培训集。我们建议从一个称为记忆的新角度调查隐私。通过记忆的镜头,我们发现先前部署的MIA会产生误导性结果,因为与具有低隐私风险的样本相比,它们不太可能将具有较高隐私风险的样本确定为成员。为了解决这个问题,我们部署了最近的攻击,该攻击可以捕获单个样本的记忆度以进行评估。通过广泛的实验,我们提出了有关机器学习模型三个基本特性(包括隐私,概括差距和对抗性鲁棒性)之间连接的几个发现。我们证明,概括差距和隐私泄漏的相关性不如先前结果的相关性。此外,不一定会在对抗性鲁棒性和隐私之间进行权衡,因为更强的对抗性鲁棒性并不能使模型更容易受到隐私攻击的影响。
Machine learning poses severe privacy concerns as it has been shown that the learned models can reveal sensitive information about their training data. Many works have investigated the effect of widely adopted data augmentation and adversarial training techniques, termed data enhancement in the paper, on the privacy leakage of machine learning models. Such privacy effects are often measured by membership inference attacks (MIAs), which aim to identify whether a particular example belongs to the training set or not. We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture individual samples' memorization degrees for evaluation. Through extensive experiments, we unveil several findings about the connections between three essential properties of machine learning models, including privacy, generalization gap, and adversarial robustness. We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results. Moreover, there is not necessarily a trade-off between adversarial robustness and privacy as stronger adversarial robustness does not make the model more susceptible to privacy attacks.