通过MCMC对变异自动编码器的对抗性攻击

论文标题

通过MCMC对变异自动编码器的对抗性攻击

Alleviating Adversarial Attacks on Variational Autoencoders with MCMC

论文作者

Kuzina, Anna, Welling, Max, Tomczak, Jakub M.

论文摘要

变异自动编码器（VAE）是可产生复杂对象并提供有意义的潜在表示的潜在变量模型。此外，它们可以在分类等下游任务中进一步使用。正如以前的工作所表明的那样，人们可以轻松地欺骗VAE，以产生意外的潜在表示和重建，以进行视觉上稍微修改的输入。在这里，我们研究了先前提出的对抗性攻击构建的几个目标功能，并提出了一种减轻这些攻击影响的解决方案。我们的方法在我们通过理论分析激励的推理步骤中利用了马尔可夫链蒙特卡洛（MCMC）技术。因此，我们在培训期间没有纳入任何额外的费用，并且未攻击投入的性能不会降低。我们在各种数据集（MNIST，时尚MNIST，COLOR MNIST，CELEBA）和VAE配置（$β$ -VAE，NVAE，NVAE，$β$ -TCVAE）上验证了方法，并表明我们的方法一致地改善了对对抗性攻击的模型稳健性。

Variational autoencoders (VAEs) are latent variable models that can generate complex objects and provide meaningful latent representations. Moreover, they could be further used in downstream tasks such as classification. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attack construction proposed previously and present a solution to alleviate the effect of these attacks. Our method utilizes the Markov Chain Monte Carlo (MCMC) technique in the inference step that we motivate with a theoretical analysis. Thus, we do not incorporate any extra costs during training, and the performance on non-attacked inputs is not decreased. We validate our approach on a variety of datasets (MNIST, Fashion MNIST, Color MNIST, CelebA) and VAE configurations ($β$-VAE, NVAE, $β$-TCVAE), and show that our approach consistently improves the model robustness to adversarial attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题