通过对抗关键调整评估神经网络鲁棒性

论文标题

通过对抗关键调整评估神经网络鲁棒性

Assessing Neural Network Robustness via Adversarial Pivotal Tuning

论文作者

Christensen, Peter Ebert, Snæbjarnarson, Vésteinn, Dittadi, Andrea, Belongie, Serge, Benaim, Sagie

论文摘要

图像分类器的鲁棒性对于它们在现实世界中的部署至关重要。因此，评估对操纵或与培训数据偏差的这种弹性的能力至关重要。传统上，这些修改包括最小的变化，这些变化仍然设法愚弄分类器，现代方法对它们越来越强大。因此，以有意义的方式修改图像元素的语义操作已为此目的获得了关注。但是，它们主要仅限于样式，颜色或属性更改。这些操纵表现不佳，但并未利用验证的生成模型的全部功能。在这项工作中，我们旨在弥合这一差距。我们展示了如何使用验证的图像发生器来以详细，多样化和逼真的方式来操纵图像，同时仍保留原始图像的类别。受到最新基于GAN的图像反转方法的启发，我们提出了一种称为对抗关键调谐（APT）的方法。给定图像，APT首先找到一个枢轴潜在空间输入，该空间输入使用预审预告片的发电机重建图像。然后，它调整了发电机的重量，以创建小而却是语义操作，以欺骗据预验证的分类器。 APT保留了生成模型的完整表达编辑功能。我们证明，APT能够进行各种私有的语义图像操纵，使各种预验证的分类器欺骗。最后，我们表明，对其他基准有鲁棒的分类器对适合操作并不强大，并提出了一种改进它们的方法。代码可用：https：//captaine.github.io/apt/

The robustness of image classifiers is essential to their deployment in the real world. The ability to assess this resilience to manipulations or deviations from the training data is thus crucial. These modifications have traditionally consisted of minimal changes that still manage to fool classifiers, and modern approaches are increasingly robust to them. Semantic manipulations that modify elements of an image in meaningful ways have thus gained traction for this purpose. However, they have primarily been limited to style, color, or attribute changes. While expressive, these manipulations do not make use of the full capabilities of a pretrained generative model. In this work, we aim to bridge this gap. We show how a pretrained image generator can be used to semantically manipulate images in a detailed, diverse, and photorealistic way while still preserving the class of the original image. Inspired by recent GAN-based image inversion methods, we propose a method called Adversarial Pivotal Tuning (APT). Given an image, APT first finds a pivot latent space input that reconstructs the image using a pretrained generator. It then adjusts the generator's weights to create small yet semantic manipulations in order to fool a pretrained classifier. APT preserves the full expressive editing capabilities of the generative model. We demonstrate that APT is capable of a wide range of class-preserving semantic image manipulations that fool a variety of pretrained classifiers. Finally, we show that classifiers that are robust to other benchmarks are not robust to APT manipulations and suggest a method to improve them. Code available at: https://captaine.github.io/apt/

下载PDF全文

下载文献需遵守相关版权规定

论文标题