论文标题
测量神经网络中独特私人特征的意外记忆
Measuring Unintended Memorisation of Unique Private Features in Neural Networks
论文作者
论文摘要
神经网络由于其记忆和泄漏信息的倾向而对培训数据构成隐私风险。为了关注图像分类,我们表明神经网络即使在训练数据中仅发生一次,神经网络也会记住独特的功能。独特功能的一个例子是一个人的名字,它在训练图像上意外存在。假设访问受过训练的模型的输入和输出,训练数据的领域以及对独特功能的了解,我们通过比较了经过修改的分布外图像,通过比较模型输出分布的KL差异来估算模型对独特功能的敏感性。我们的结果表明,在基准数据集(例如MNIST,Fashion-Mnist和CIFAR-10)上训练的多层感知器和卷积神经网络记住了独特的功能。我们发现,防止过度拟合的策略(例如\ \早期停止,正则化,批量归一化)不能阻止对独特特征的记忆。这些结果表明,神经网络对很少发生私人信息构成隐私风险。如果培训数据中存在患者信息,则这些风险在医疗保健应用中可以更明显。
Neural networks pose a privacy risk to training data due to their propensity to memorise and leak information. Focusing on image classification, we show that neural networks also unintentionally memorise unique features even when they occur only once in training data. An example of a unique feature is a person's name that is accidentally present on a training image. Assuming access to the inputs and outputs of a trained model, the domain of the training data, and knowledge of unique features, we develop a score estimating the model's sensitivity to a unique feature by comparing the KL divergences of the model's output distributions given modified out-of-distribution images. Our results suggest that unique features are memorised by multi-layer perceptrons and convolutional neural networks trained on benchmark datasets, such as MNIST, Fashion-MNIST and CIFAR-10. We find that strategies to prevent overfitting (e.g.\ early stopping, regularisation, batch normalisation) do not prevent memorisation of unique features. These results imply that neural networks pose a privacy risk to rarely occurring private information. These risks can be more pronounced in healthcare applications if patient information is present in the training data.