事实证明了强大的对抗性例子

论文标题

事实证明了强大的对抗性例子

Provably Robust Adversarial Examples

论文作者

Dimitrov, Dimitar I., Singh, Gagandeep, Gehr, Timon, Vechev, Martin

论文摘要

我们介绍了深度神经网络的可证明可靠的对抗性示例的概念 - 由标准对抗性示例构建的连接的输入区域，这些区域可以保证对一系列现实世界的扰动（例如像素强度和几何变换的变化）。我们提出了一种名为“游行”的新方法，该方法是以可扩展的方式生成这些区域的，该区域通过迭代优化最初通过采样获得的区域起作用，直到精制区域被证明具有对现有的最新验证器的对抗性为止。在每个步骤中，都采用新颖的优化程序来最大程度地提高该区域的体积，即凸起网络行为相对于该地区的凸出意味着在认证目标上所选择的绑定。我们的实验评估表明了游行的有效性：它成功地发现了大型可靠的区域，其中包括$ \ \ \ \ \ \ \ 10^{573} $的对抗性示例，用于像素强度，而几何扰动的$ \ $ \ oftressarial示例。可提供性使我们的强大示例比用于构建区域的单个攻击的实例更有效地针对最先进的防御能力更有效。

We introduce the concept of provably robust adversarial examples for deep neural networks - connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which works by iteratively refining the region initially obtained via sampling until a refined region is certified to be adversarial with existing state-of-the-art verifiers. At each step, a novel optimization procedure is applied to maximize the region's volume under the constraint that the convex relaxation of the network behavior with respect to the region implies a chosen bound on the certification objective. Our experimental evaluation shows the effectiveness of PARADE: it successfully finds large provably robust regions including ones containing $\approx 10^{573}$ adversarial examples for pixel intensity and $\approx 10^{599}$ for geometric perturbations. The provability enables our robust examples to be significantly more effective against state-of-the-art defenses based on randomized smoothing than the individual attacks used to construct the regions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题