预处理的变压器并不总是改善鲁棒性

论文标题

预处理的变压器并不总是改善鲁棒性

Pretrained Transformers Do not Always Improve Robustness

论文作者

Mishra, Swaroop, Sachdeva, Bhavdeep Singh, Baral, Chitta

论文摘要

与传统模型相比，预处理的变压器（PT）已显示出可以改善分布（OOD）的鲁棒性，例如单词袋（BOW），LSTMS，卷积神经网络（CNN），该模型由Word2Vec和手套嵌入供力。在现实世界中，鲁棒性比较如何在数据集的某些部分嘈杂？ PT是否还提供了与传统模型有关嘈杂数据的更强大的表示？我们对10个模型进行了比较研究，并找到一个经验证据，即PT比传统模型在暴露于嘈杂数据的情况下提供了较低的强大表示。我们通过对抗性过滤（AF）机制进一步研究并增强PT，该机制已证明可以改善OOD的概括。但是，概括的增加并不一定会提高鲁棒性，因为我们发现嘈杂的数据欺骗了由PT提供动力的AF方法。

Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does the robustness comparison hold in a real world setting where some part of the dataset can be noisy? Do PT also provide more robust representation than traditional models on exposure to noisy data? We perform a comparative study on 10 models and find an empirical evidence that PT provide less robust representation than traditional models on exposure to noisy data. We investigate further and augment PT with an adversarial filtering (AF) mechanism that has been shown to improve OOD generalization. However, increase in generalization does not necessarily increase robustness, as we find that noisy data fools the AF method powered by PT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题