论文标题
用于私人和数据效率学习的嘈杂数据的有损压缩
Lossy Compression of Noisy Data for Private and Data-Efficient Learning
论文作者
论文摘要
由于现代学习任务所需的敏感用户数据量增加,因此具有存储效率的隐私学习至关重要。我们提出了一个框架,用于降低用户数据的存储成本,同时提供隐私保证,而不会在数据的实用性中丢失。我们的方法包括注入噪声,然后是有损压缩。我们表明,当将有损耗压缩与添加噪声的分布恰当地匹配时,压缩示例在分布中将无噪声训练数据的分布收敛,因为训练数据的样本量(或训练数据的维度)会增加。从这个意义上讲,学习数据的实用性基本上是维护的,同时通过可量化的数量减少存储和隐私泄漏。我们对Celeba数据集进行了实验结果,以实现性别分类,并发现我们建议的管道在实践中提供了理论的承诺:图像中的个体无法识别(或较少可识别(可识别,取决于噪声水平),数据的总体存储大大降低,在某些情况下没有损失(在某些情况下均造成较小的损失)(并促进分类的精度)。作为额外的奖励,我们的实验表明,面对对抗性测试数据,我们的方法可以大大提高鲁棒性。
Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data as the sample size of the training data (or the dimension of the training data) increases. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender classification and find that our suggested pipeline delivers in practice on the promise of the theory: the individuals in the images are unrecognizable (or less recognizable, depending on the noise level), overall storage of the data is substantially reduced, with no essential loss (and in some cases a slight boost) to the classification accuracy. As an added bonus, our experiments suggest that our method yields a substantial boost to robustness in the face of adversarial test data.