论文标题
研究设计不同的私人文本生成机制方面的研究挑战
Research Challenges in Designing Differentially Private Text Generation Mechanisms
论文作者
论文摘要
准确地从用户数据中学习,同时确保可量化的隐私保证为建立更好的机器学习(ML)模型提供了机会,同时保持用户信任。最近的文献证明了一种广义形式的差异隐私形式的适用性,以提供有关文本查询的保证。这种机制增加了隐私在高维中文本的矢量表示,并返回嘈杂矢量的基于文本的投影。但是,这些机制在隐私和效用之间的权衡方面是最佳的。这是由于诸如固定的全球灵敏度之类的因素所致,该因素会导致在密集的空间中添加过多的噪声,同时保证对敏感异常值的保护。在本提案文件中,我们描述了这些差异私人文本机制之间的折衷与效用之间的折衷方面的挑战。在高水平上,我们提供了两个建议:(1)一个称为LAC的框架,该框架将某些噪声定义为隐私放大步骤和(2),这是一套由三种不同技术组成的套件,用于根据单词围绕一个单词的本地区域校准噪声。本文我们的目标不是评估单一解决方案,而是要进一步谈论这些挑战和制定更好机制的图表途径。
Accurately learning from user data while ensuring quantifiable privacy guarantees provides an opportunity to build better Machine Learning (ML) models while maintaining user trust. Recent literature has demonstrated the applicability of a generalized form of Differential Privacy to provide guarantees over text queries. Such mechanisms add privacy preserving noise to vectorial representations of text in high dimension and return a text based projection of the noisy vectors. However, these mechanisms are sub-optimal in their trade-off between privacy and utility. This is due to factors such as a fixed global sensitivity which leads to too much noise added in dense spaces while simultaneously guaranteeing protection for sensitive outliers. In this proposal paper, we describe some challenges in balancing the tradeoff between privacy and utility for these differentially private text mechanisms. At a high level, we provide two proposals: (1) a framework called LAC which defers some of the noise to a privacy amplification step and (2), an additional suite of three different techniques for calibrating the noise based on the local region around a word. Our objective in this paper is not to evaluate a single solution but to further the conversation on these challenges and chart pathways for building better mechanisms.