论文标题
基础模型可以帮助我们实现完美的保密吗?
Can Foundation Models Help Us Achieve Perfect Secrecy?
论文作者
论文摘要
机器学习的关键承诺是能够协助用户完成个人任务。由于做出准确预测所需的个人环境通常是敏感的,因此我们需要保护隐私的系统。黄金标准隐私的系统将满足完美的保密性,这意味着与系统的互动证明没有透露私人信息。但是,在现有系统中,隐私和质量似乎在个人任务中处于紧张状态。神经模型通常需要大量的培训才能表现良好,而单个用户通常持有有限的数据,因此联合学习(FL)系统建议从多个用户的总数据中学习。 FL不能提供完美的保密性,而是实践者采用隐私统计概念 - 即,学习有关用户的私人信息的可能性应相当低。隐私保证的强度受隐私参数约束。在FL系统上已经证明了许多隐私攻击,因此推理针对隐私敏感的用例的适当隐私参数可能是具有挑战性的。因此,我们的工作提出了FL的简单基线,这两者都提供了更强大的完美保密保证,并且不需要设置任何隐私参数。我们启动了ML中新兴工具的何时何地(最近预审预告额的模型的内在学习能力)可以与FL一起有效的基线。我们发现,在隐私文献中的7个流行基准和现实世界中的案例研究中,在7个受欢迎的基准中,具有强大的FL基线具有竞争力,这是与预处理数据的脱节。我们在此处发布代码:https://github.com/simran-arora/focus
A key promise of machine learning is the ability to assist users with personal tasks. Because the personal context required to make accurate predictions is often sensitive, we require systems that protect privacy. A gold standard privacy-preserving system will satisfy perfect secrecy, meaning that interactions with the system provably reveal no private information. However, privacy and quality appear to be in tension in existing systems for personal tasks. Neural models typically require copious amounts of training to perform well, while individual users typically hold a limited scale of data, so federated learning (FL) systems propose to learn from the aggregate data of multiple users. FL does not provide perfect secrecy, but rather practitioners apply statistical notions of privacy -- i.e., the probability of learning private information about a user should be reasonably low. The strength of the privacy guarantee is governed by privacy parameters. Numerous privacy attacks have been demonstrated on FL systems and it can be challenging to reason about the appropriate privacy parameters for a privacy-sensitive use case. Therefore our work proposes a simple baseline for FL, which both provides the stronger perfect secrecy guarantee and does not require setting any privacy parameters. We initiate the study of when and where an emerging tool in ML -- the in-context learning abilities of recent pretrained models -- can be an effective baseline alongside FL. We find in-context learning is competitive with strong FL baselines on 6 of 7 popular benchmarks from the privacy literature and a real-world case study, which is disjoint from the pretraining data. We release our code here: https://github.com/simran-arora/focus