论文标题
ISOBN:以及各向同性批准的微调BERT
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
论文作者
论文摘要
微调预训练的语言模型(PTLMS),例如BERT及其更好的Roberta,一直是推进自然语言理解(NLU)任务的表现的普遍做法。表示的最新进展表明,各向同性(即单位变化和不相关的)嵌入可以显着提高以更快的收敛性和更好概括的下游任务的性能。但是,PTLMS中预训练的嵌入的各向同性相对探索。在本文中,我们分析了具有直接可视化的PTLM的预训练的[CLS]嵌入的各向同性,并指出了两个主要问题:其标准偏差的较高差异以及不同维度之间的高相关性。我们还提出了一种新的网络正则化方法,各向同性批归一化(ISOBN)来解决这些问题,以通过动态惩罚主导主体组件来学习更多的各向同性表示。这种简单而有效的微调方法在七个NLU任务的平均值中产生了约1.0的绝对增量。
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.