Alexatm 20b：使用大型多语言SEQ2SEQ模型的学习很少

论文标题

Alexatm 20b：使用大型多语言SEQ2SEQ模型的学习很少

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

论文作者

Soltan, Saleh, Ananthakrishnan, Shankar, FitzGerald, Jack, Gupta, Rahul, Hamza, Wael, Khan, Haidar, Peris, Charith, Rawls, Stephen, Rosenbaum, Andy, Rumshisky, Anna, Prakash, Chandana Satya, Sridhar, Mukund, Triefenbach, Fabian, Verma, Apurv, Tur, Gokhan, Natarajan, Prem

论文摘要

在这项工作中，我们证明了多种语言大规模序列到序列（SEQ2SEQ）模型，该模型是通过Denoing和因果语言建模（CLM）任务的混合物进行训练的，比在各种任务上仅解码模型的模型更有效。特别是，我们训练称为Alexa教师模型（Alexatm 20B）的200亿个参数SEQ2SEQ模型，并表明它在1-Shot摘要任务上实现了最先进的（SOTA）性能，超过了更大的540B棕榈解码器模型。 Alexatm 20b还可以在1-Shot Machine翻译中实现SOTA，尤其是对于低资源语言，几乎所有语言对（阿拉伯语，英语，法语，德语，德语，印地语，意大利语，日语，马拉地语，马拉地语，葡萄牙语，西班牙语，泰米尔语和泰卢固语）在FLORES-101 DATASET上支持。我们还显示了零拍设置，AlexATM 20B在SuperGlue和SquadV2数据集上的表现优于GPT3（175B），并在XNLI，XCOPA，PAWS-X和XWINOGRAD等多语言任务上提供SOTA性能。总体而言，我们的结果为SEQ2SEQ模型提供了一个令人信服的案例，作为大型语言模型（LLM）培训的仅解码器模型的有力替代方法。

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题