预训练的感知功能改善了差异化的私有图像生成

论文标题

预训练的感知功能改善了差异化的私有图像生成

Pre-trained Perceptual Features Improve Differentially Private Image Generation

论文作者

Harder, Fredrik, Asadabadi, Milad Jalali, Sutherland, Danica J., Park, Mijung

论文摘要

训练甚至具有差异性随机梯度下降（DP-SGD）的中等大小的生成模型也很困难：合理的隐私水平所需的噪声水平太大了。相反，我们主张在内容丰富的公共数据集上建立一个良好的相关表示形式，然后学会用该表示形式建模私人数据。特别是，我们使用基于从公共数据集中学到的感知功能的内核来最大程度地减少私人目标数据和发电机分布之间的最大平均差异（MMD）。使用MMD，我们可以简单地将数据依赖性项私有化，而不是像DP-SGD中的每个优化步骤中引入噪声。我们的算法使我们能够用$ε\ 2 $生成CIFAR10级图像，该图像在分布中捕获了独特的特征，远远超过了当前的最新状态，该目录主要集中在MNIST和FashionMnist的数据集上，例如$ $ε\约10美元。我们的工作介绍了简单而强大的基础，以减少私人和非私有性深层生成模型之间的差距。我们的代码可在\ url {https://github.com/parklabml/dp-mepf}上找到。

Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large. We advocate instead building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation. In particular, we minimize the maximum mean discrepancy (MMD) between private target data and a generator's distribution, using a kernel based on perceptual features learned from a public dataset. With the MMD, we can simply privatize the data-dependent term once and for all, rather than introducing noise at each step of optimization as in DP-SGD. Our algorithm allows us to generate CIFAR10-level images with $ε\approx 2$ which capture distinctive features in the distribution, far surpassing the current state of the art, which mostly focuses on datasets such as MNIST and FashionMNIST at a large $ε\approx 10$. Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models. Our code is available at \url{https://github.com/ParkLabML/DP-MEPF}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题