使用强大的内核密度估计设计强大的变压器

论文标题

使用强大的内核密度估计设计强大的变压器

Designing Robust Transformers using Robust Kernel Density Estimation

论文作者

Han, Xing, Ren, Tongzheng, Nguyen, Tan Minh, Nguyen, Khai, Ghosh, Joydeep, Ho, Nhat

论文摘要

变压器体系结构的最新进展赋予了他们在不同领域的各种任务中的经验成功。但是，现有作品主要集中于预测精度和计算成本，而无需考虑其他实际问题，例如对污染样本的鲁棒性。 Nguyen等人（2022年）的最新工作表明，自我发场机制是变压器体系结构的中心，可以看作是基于内核密度估计（KDE）的非参数估计器。这促使我们利用一组强大的内核密度估计方法来减轻数据污染问题。具体而言，我们介绍了一系列可以将其纳入不同变压器体系结构并讨论每种方法的特性的一系列自我发挥机制。然后，我们对语言建模和图像分类任务进行广泛的经验研究。我们的方法在多种情况下表明了出色的性能，同时在干净的数据集上保持竞争性结果。

Recent advances in Transformer architectures have empowered their empirical success in a variety of tasks across different domains. However, existing works mainly focus on predictive accuracy and computational cost, without considering other practical issues, such as robustness to contaminated samples. Recent work by Nguyen et al., (2022) has shown that the self-attention mechanism, which is the center of the Transformer architecture, can be viewed as a non-parametric estimator based on kernel density estimation (KDE). This motivates us to leverage a set of robust kernel density estimation methods for alleviating the issue of data contamination. Specifically, we introduce a series of self-attention mechanisms that can be incorporated into different Transformer architectures and discuss the special properties of each method. We then perform extensive empirical studies on language modeling and image classification tasks. Our methods demonstrate robust performance in multiple scenarios while maintaining competitive results on clean datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题