区域图像扰动减少$ l_p $在维护模型到模型转移性的同时

论文标题

区域图像扰动减少$ l_p $在维护模型到模型转移性的同时

Regional Image Perturbation Reduces $L_p$ Norms of Adversarial Examples While Maintaining Model-to-model Transferability

论文作者

Ozbulak, Utku, Peck, Jonathan, De Neve, Wesley, Goossens, Bart, Saeys, Yvan, Van Messem, Arnout

论文摘要

区域对抗性攻击通常依赖于复杂的方法来产生对抗性扰动，从而使他们的疗效与知名攻击相比很难进行比较。在这项研究中，我们表明可以在不采用复杂方法的情况下产生有效的区域扰动。我们使用跨透明率标志开发了一种非常简单的区域对抗攻击方法，这是对抗机器学习中最常用的损失之一。我们在具有多个模型的Imagenet上进行的实验表明，当将扰动应用于本地图像区域时，平均而言，生成的对抗性示例中的$ 76 \％$保持模型到模型的转移性。根据所选区域，这些本地化的对抗性示例与非本地同类产品相比，这些本地化的对抗示例（对于$ p \ in \ {0，2，\ infty \} $）所需的$ L_P $ norm扭曲（对于$ p \ in \ {0，2，\ infty \} $）。因此，这些局部攻击有可能破坏在上述规范下声称鲁棒性的防御能力。

Regional adversarial attacks often rely on complicated methods for generating adversarial perturbations, making it hard to compare their efficacy against well-known attacks. In this study, we show that effective regional perturbations can be generated without resorting to complex methods. We develop a very simple regional adversarial perturbation attack method using cross-entropy sign, one of the most commonly used losses in adversarial machine learning. Our experiments on ImageNet with multiple models reveal that, on average, $76\%$ of the generated adversarial examples maintain model-to-model transferability when the perturbation is applied to local image regions. Depending on the selected region, these localized adversarial examples require significantly less $L_p$ norm distortion (for $p \in \{0, 2, \infty\}$) compared to their non-local counterparts. These localized attacks therefore have the potential to undermine defenses that claim robustness under the aforementioned norms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题