ISOBN：以及各向同性批准的微调BERT

论文标题

ISOBN：以及各向同性批准的微调BERT

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

论文作者

Zhou, Wenxuan, Lin, Bill Yuchen, Ren, Xiang

论文摘要

微调预训练的语言模型（PTLMS），例如BERT及其更好的Roberta，一直是推进自然语言理解（NLU）任务的表现的普遍做法。表示的最新进展表明，各向同性（即单位变化和不相关的）嵌入可以显着提高以更快的收敛性和更好概括的下游任务的性能。但是，PTLMS中预训练的嵌入的各向同性相对探索。在本文中，我们分析了具有直接可视化的PTLM的预训练的[CLS]嵌入的各向同性，并指出了两个主要问题：其标准偏差的较高差异以及不同维度之间的高相关性。我们还提出了一种新的网络正则化方法，各向同性批归一化（ISOBN）来解决这些问题，以通过动态惩罚主导主体组件来学习更多的各向同性表示。这种简单而有效的微调方法在七个NLU任务的平均值中产生了约1.0的绝对增量。

Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题