论文标题
学习扰动集用于健壮的机器学习
Learning perturbation sets for robust machine learning
论文作者
论文摘要
尽管在健壮的深度学习方面取得了很大的进步,但在现实世界扰动和更狭义的范围内,鲁棒性的差距仍然存在很大的差距,通常在对抗性防御中进行了研究。在本文中,我们旨在通过从数据中学习扰动集来弥合这一差距,以表征现实世界中的效果,以进行稳健的培训和评估。具体而言,我们使用一个条件发生器,该发生器定义在潜在空间的约束区域上设置的扰动。我们制定了理想的特性,以衡量学习的扰动集的质量,从理论上证明条件变异自动编码器自然满足了这些标准。使用此框架,我们的方法可以在不同的复杂性和尺度上产生各种扰动,从基线空间转换到通用图像损坏到照明变化。我们在定量和定性上衡量我们学到的扰动集的质量,发现我们的模型能够产生一套有意义的扰动,而不是训练过程中看到的有限数据。最后,我们利用我们学到的扰动集来训练模型,这些模型在经验和证明方面具有鲁棒性,可对对抗性图像腐败和对抗照明变化,同时改善对非对抗数据的概括。可以在https://github.com/locuslab/perturbation_learning上找到所有用于复制实验以及预验证的模型权重的代码和配置文件。
Although much progress has been made towards robust deep learning, a significant gap in robustness remains between real-world perturbations and more narrowly defined sets typically studied in adversarial defenses. In this paper, we aim to bridge this gap by learning perturbation sets from data, in order to characterize real-world effects for robust training and evaluation. Specifically, we use a conditional generator that defines the perturbation set over a constrained region of the latent space. We formulate desirable properties that measure the quality of a learned perturbation set, and theoretically prove that a conditional variational autoencoder naturally satisfies these criteria. Using this framework, our approach can generate a variety of perturbations at different complexities and scales, ranging from baseline spatial transformations, through common image corruptions, to lighting variations. We measure the quality of our learned perturbation sets both quantitatively and qualitatively, finding that our models are capable of producing a diverse set of meaningful perturbations beyond the limited data seen during training. Finally, we leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations, while improving generalization on non-adversarial data. All code and configuration files for reproducing the experiments as well as pretrained model weights can be found at https://github.com/locuslab/perturbation_learning.