回收废料：通过利用中级检查站来改善私人学习

论文标题

回收废料：通过利用中级检查站来改善私人学习

Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

论文作者

Shejwalkar, Virat, Ganesh, Arun, Mathews, Rajiv, Mu, Yarong, Song, Shuang, Thakkar, Om, Thakurta, Abhradeep, Zheng, Xinyi

论文摘要

在这项工作中，我们专注于提高最新私人机器学习（DP ML）方法的准确性变化权衡。首先，我们设计了一个通用框架，该框架使用中间检查点的聚集体\ emph {在训练期间}来提高DP ML技术的准确性。具体而言，我们证明了对聚集体的培训可以为stackoverflow，cifar10和cifar100数据集提供预测准确性的显着提高。例如，对于$ε= 8.2 $，我们将最新的DP stackoverflow精度提高到22.74 \％（+2.06 \％相对），而23.90 \％（+2.09 \％），$ε= 18.9 $。此外，这些收益在具有定期变化的培训数据分布的设置中会放大。我们还证明，在专有的生产级PCVR任务上，我们的方法在效用和差异方面实现了0.54 \％和62.6 \％的相对改进。最后，我们开始探索DP噪声在DP ML模型的预测中增加的不确定性（方差）。我们证明，在对损耗函数的标准假设下，来自最后几个检查点的样本差异提供了DP运行最终模型方差的良好近似值。从经验上讲，我们表明最后几个检查点可以为收敛DP模型的方差提供合理的下限。至关重要的是，本文提出的所有方法均在\ emph {DP ML技术的单个培训运行}上运行，因此没有产生额外的隐私成本。

In this work, we focus on improving the accuracy-variance trade-off for state-of-the-art differentially private machine learning (DP ML) methods. First, we design a general framework that uses aggregates of intermediate checkpoints \emph{during training} to increase the accuracy of DP ML techniques. Specifically, we demonstrate that training over aggregates can provide significant gains in prediction accuracy over the existing state-of-the-art for StackOverflow, CIFAR10 and CIFAR100 datasets. For instance, we improve the state-of-the-art DP StackOverflow accuracies to 22.74\% (+2.06\% relative) for $ε=8.2$, and 23.90\% (+2.09\%) for $ε=18.9$. Furthermore, these gains magnify in settings with periodically varying training data distributions. We also demonstrate that our methods achieve relative improvements of 0.54\% and 62.6\% in terms of utility and variance, on a proprietary, production-grade pCVR task. Lastly, we initiate an exploration into estimating the uncertainty (variance) that DP noise adds in the predictions of DP ML models. We prove that, under standard assumptions on the loss function, the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run. Empirically, we show that the last few checkpoints can provide a reasonable lower bound for the variance of a converged DP model. Crucially, all the methods proposed in this paper operate on \emph{a single training run} of the DP ML technique, thus incurring no additional privacy cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题