数据增强如何影响机器学习中的隐私？

论文标题

数据增强如何影响机器学习中的隐私？

How Does Data Augmentation Affect Privacy in Machine Learning?

论文作者

Yu, Da, Zhang, Huishuai, Chen, Wei, Yin, Jian, Liu, Tie-Yan

论文摘要

在文献中观察到，数据增强可以大大减轻会员推理（MI）攻击。但是，在这项工作中，我们通过提出新的MI攻击来利用增强数据的信息来挑战这一观察结果。 MI攻击广泛用于测量模型的训练集信息泄漏。当模型接受增强数据培训时，我们建立了最佳的会员推断，这激发了我们将MI攻击作为设定的分类问题，即对一组增强实例（而不是单个数据点）进行分类，并设计输入置换不变的功能。从经验上讲，我们证明了所提出的方法在使用数据增强培训时普遍优于原始方法。更进一步的是，我们表明，所提出的方法可以比在未经数据扩展的模型上实现培训一些数据增强的模型的MI攻击成功率。值得注意的是，我们在CIFAR10上获得了70.1％的MI攻击成功率，而先前的最佳方法仅达到61.9％。这表明，通过数据扩展培训的模型的隐私风险可能会在很大程度上被低估。

It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI attack is widely used to measure the model's information leakage of the training set. We establish the optimal membership inference when the model is trained with augmented data, which inspires us to formulate the MI attack as a set classification problem, i.e., classifying a set of augmented instances instead of a single data point, and design input permutation invariant features. Empirically, we demonstrate that the proposed approach universally outperforms original methods when the model is trained with data augmentation. Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation. Notably, we achieve a 70.1% MI attack success rate on CIFAR10 against a wide residual network while the previous best approach only attains 61.9%. This suggests the privacy risk of models trained with data augmentation could be largely underestimated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题