通过与分发问题的联系来重新思考机器学习鲁棒性

论文标题

通过与分发问题的联系来重新思考机器学习鲁棒性

Rethinking Machine Learning Robustness via its Link with the Out-of-Distribution Problem

论文作者

Amich, Abderrahmen, Eshete, Birhanu

论文摘要

尽管为强大的机器学习（ML）模型做出了多项努力，但它们对对抗性例子的脆弱性仍然是一个充满挑战的问题，它要求重新思考防御策略。在本文中，我们退后一步，研究ML模型对对抗性例子的敏感性背后的原因。特别是，我们专注于探索对抗性示例与分布（OOD）问题之间的原因效应联系。为此，我们提出了一种OOD泛化方法，该方法既反对对手诱导的分布变化又反对。通过OOD到分布映射直觉，我们的方法将OOD输入转化为用于训练和测试模型的数据分布。通过对不同量表（MNIST，CIFAR10和IMAGENET）的三个基准图像数据集进行的大量实验，并利用图像到图像翻译方法，我们确认对抗性示例问题问题是更广泛的OOD通用性问题的特殊情况。在所有数据集中，我们表明我们的基于翻译的方法一致地提高了对对抗性输入的鲁棒性，并且优于最先进的防御能力，同时保持了良性（分配）数据的精确精度。此外，我们的方法概括了自然的OOD输入，例如较暗或更清晰的图像

Despite multiple efforts made towards robust machine learning (ML) models, their vulnerability to adversarial examples remains a challenging problem that calls for rethinking the defense strategy. In this paper, we take a step back and investigate the causes behind ML models' susceptibility to adversarial examples. In particular, we focus on exploring the cause-effect link between adversarial examples and the out-of-distribution (OOD) problem. To that end, we propose an OOD generalization method that stands against both adversary-induced and natural distribution shifts. Through an OOD to in-distribution mapping intuition, our approach translates OOD inputs to the data distribution used to train and test the model. Through extensive experiments on three benchmark image datasets of different scales (MNIST, CIFAR10, and ImageNet) and by leveraging image-to-image translation methods, we confirm that the adversarial examples problem is a special case of the wider OOD generalization problem. Across all datasets, we show that our translation-based approach consistently improves robustness to OOD adversarial inputs and outperforms state-of-the-art defenses by a significant margin, while preserving the exact accuracy on benign (in-distribution) data. Furthermore, our method generalizes on naturally OOD inputs such as darker or sharper images

下载PDF全文

下载文献需遵守相关版权规定

论文标题