知道您不需要的东西：注意力头的单发元元素

论文标题

知道您不需要的东西：注意力头的单发元元素

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

论文作者

Zhang, Zhengyan, Qi, Fanchao, Liu, Zhiyuan, Liu, Qun, Sun, Maosong

论文摘要

深度训练的变压器模型已在各种自然语言处理（NLP）任务上取得了最先进的结果。通过学习具有数百万个参数的丰富语言知识，这些模型通常被过度参数化，并显着增加了应用程序的计算开销。通过模型压缩解决此问题是直观的。在这项工作中，我们提出了一种称为单发元元素的方法，以在微调之前压缩深度训练的变压器。具体而言，我们专注于修剪不必要的注意力负责人针对不同的下游任务。为了衡量注意力头的信息性，我们使用元学习范式训练单发元推 - （SMP），旨在维持修剪后文本表示的分布。与现有的预训练模型的压缩方法相比，我们的方法可以减少微调和推理的开销。实验结果表明，我们的修剪可以选择性地修剪50％的注意力头，对下游任务的性能几乎没有影响，甚至提供更好的文本表示。源代码将来会发布。

Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually overparameterized and significantly increase the computational overhead in applications. It is intuitive to address this issue by model compression. In this work, we propose a method, called Single-Shot Meta-Pruning, to compress deep pre-trained Transformers before fine-tuning. Specifically, we focus on pruning unnecessary attention heads adaptively for different downstream tasks. To measure the informativeness of attention heads, we train our Single-Shot Meta-Pruner (SMP) with a meta-learning paradigm aiming to maintain the distribution of text representations after pruning. Compared with existing compression methods for pre-trained models, our method can reduce the overhead of both fine-tuning and inference. Experimental results show that our pruner can selectively prune 50% of attention heads with little impact on the performance on downstream tasks and even provide better text representations. The source code will be released in the future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题