论文标题
大多数机器学习任务的一通私人草图
A One-Pass Private Sketch for Most Machine Learning Tasks
论文作者
论文摘要
差异隐私(DP)是一个引人注目的隐私定义,可以通过正式的,可证明的保证来解释隐私 - 实用性的权衡。受到通用数据发布算法的最新进展的启发,我们提出了一个私人草图或数据集的小摘要,该摘要支持了许多机器学习任务,包括回归,分类,密度估算,近邻居搜索等。我们的草图由随机意外表组成,这些表被带有局部性敏感的哈希索引,并使用有效的一通算法构建。我们证明了DP内核密度估计的竞争误差界限。 DP内核密度估计量表的现有方法较差,通常会随着尺寸的增加而成倍缓慢。相比之下,我们的草图可以在一次通行证中迅速在大型高维数据集上运行。详尽的实验表明,与现有的DP方法相比,我们的通用草图在计算成本的一小部分中提供了类似的隐私 - 效用折衷。我们预计我们的草图将在分布式大规模的机器学习设置中实现差异隐私。
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. Inspired by recent progress toward general-purpose data release algorithms, we propose a private sketch, or small summary of the dataset, that supports a multitude of machine learning tasks including regression, classification, density estimation, near-neighbor search, and more. Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm. We prove competitive error bounds for DP kernel density estimation. Existing methods for DP kernel density estimation scale poorly, often exponentially slower with an increase in dimensions. In contrast, our sketch can quickly run on large, high-dimensional datasets in a single pass. Exhaustive experiments show that our generic sketch delivers a similar privacy-utility tradeoff when compared to existing DP methods at a fraction of the computation cost. We expect that our sketch will enable differential privacy in distributed, large-scale machine learning settings.