迷你模型改编：通过对齐的浅培训有效地将预告额的模型扩展到新语言

论文标题

迷你模型改编：通过对齐的浅培训有效地将预告额的模型扩展到新语言

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

论文作者

Marchisio, Kelly, Lewis, Patrick, Chen, Yihong, Artetxe, Mikel

论文摘要

先前的工作表明，可以通过学习一组新的嵌入方式，同时保持变压器的身体冻结，从而将经过验证的蒙版语言模型（MLM）扩展到新语言。尽管学习了一小部分参数，但这种方法并不是计算效率，因为训练新嵌入需要对整个模型进行全面和向后传递。我们提出了Mini-Model Adaptation，这是一种计算效率的替代方案，该替代方案从大型模型参数的一部分中构建浅微型模型。然后，可以在迷你模型上有效训练新的特定于语言的嵌入，并插入对齐的大型型号进行快速跨语性转移。我们探索了学习迷你模型的两种方法：Minijoint，它们使用单个变压器在中间层的次级MLM头部共同预处理主要模型和迷你模型； Minipost我们是从常规审慎的型号开始的，它通过提取和冷冻几层来构建迷你模型，并在顶部学习少量参数。 XNLI，MLQA和PAWS-X的实验表明，使用平均2.3倍的计算较小的标准方法的性能与标准方法的性能相匹配。

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model, build a mini-model by extracting and freezing a few layers, and learn a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using 2.3x less compute on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题