用训练信号发生器的对抗混合物进行训练的文本编码器

论文标题

用训练信号发生器的对抗混合物进行训练的文本编码器

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

论文作者

Meng, Yu, Xiong, Chenyan, Bajaj, Payal, Tiwary, Saurabh, Bennett, Paul, Han, Jiawei, Song, Xia

论文摘要

我们提出了一个新的框架AMO，该框架通过来自多个辅助发电机的信号的混合物预测文本编码文本。经过电气预处理后，主编码器被训练为鉴别器，以检测由辅助蒙版语言模型（MLMS）产生的代替令牌。与将一个席作为发电机训练的伊莱克（Electra）不同，我们共同训练不同尺寸的多个MLM，以提供各种难度级别的训练信号。为了推动歧视者通过具有挑战性的代替令牌学习更好地学习，我们学习了辅助MLMS的输出的混合物权重，以通过通过Gumbel-SoftMax从歧视器将梯度反射梯度来最大化鉴别损失。为了提高训练效率，我们提出了一种将多个MLM组装成一个统一的辅助模型的方法。 AMO的表现优于Electra和最近的最先进的预测模型，在BERT基本大小的模型的胶水基准上约为1点。

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题