论文标题
私人措施,随机步行和合成数据
Private measures, random walks, and synthetic data
论文作者
论文摘要
差异隐私是一个数学概念,可提供信息理论安全保证。尽管差异隐私已成为确保数据共享隐私的事实上的标准,但已知的机制实现了它的限制。公用事业保证通常仅适用于固定的先验指定的查询集。此外,无法保证更复杂的机器学习任务,例如聚类或分类。在本文中,我们克服了其中一些局限性。使用指标隐私,对差异隐私的有力概括,我们开发了一种多项式时间算法,该算法从数据集创建了私人度量。这种私人措施使我们能够有效地构建私人合成数据,这些数据对于广泛的统计分析工具是准确的。此外,我们证明了私人措施和一般紧凑型度量空间的综合数据的渐近尖锐最低 - 最大结果。我们结构中的一个关键成分是一个新的超规则随机步行,其步骤的联合分布与独立的随机变量一样规则,但却偏离了原点对数慢慢的偏离。
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex - but very common - machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data for general compact metric spaces. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmicaly slowly.