论文标题

M $^2 $ M:一种从差异私有草图执行各种数据分析任务的一般方法

M$^2$M: A general method to perform various data analysis tasks from a differentially private sketch

论文作者

Houssiau, Florimond, Schellekens, Vincent, Chatalic, Antoine, Annamraju, Shreyas Kumar, de Montjoye, Yves-Alexandre

论文摘要

差异隐私是对敏感数据进行分析的标准隐私定义。然而,其隐私预算限制了分析师可以以合理的准确性执行的任务数量,这使得在实践中部署它具有挑战性。这可以通过私人草图来缓解,其中数据集被压缩到单个嘈杂的草图向量中,可以与分析师共享,并用于执行任意分析。但是,必须逐案开发从草图执行特定任务的算法,这是其使用的主要障碍。在本文中,我们介绍了通用的瞬间(M $^2 $ M)方法,以从单个私人草图中执行各种数据探索任务。除其他外,该方法可用于估计属性的经验时刻,协方差矩阵,计数查询(包括直方图)和回归模型。我们的方法将草图机制视为黑盒操作,因此可以应用于文献中各种各样的草图,在没有进一步的工程或隐私损失的情况下扩大了其应用范围,并消除了一些技术障碍,以实现更广泛的素描在不同隐私下进行数据勘探的草图。我们通过对人工和现实世界数据的数据探索任务来验证我们的方法,并证明它可用于可靠地估算私人草图中的统计数据和火车分类模型。

Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment (M$^2$M) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源