论文标题

阿拉伯艺术:抽象性摘要的预算阿拉伯语序列模型

AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization

论文作者

Eddine, Moussa Kamal, Tomeh, Nadi, Habash, Nizar, Roux, Joseph Le, Vazirgiannis, Michalis

论文摘要

像大多数自然的语言理解和发电任务一样,用于摘要的最新模型是基于变压器的序列到序列体系结构,这些架构是在大型语料库中鉴定的。尽管大多数现有的模型都集中在英语上,但阿拉伯语仍在研究。在本文中,我们提出了阿拉伯艺术,这是第一个基于Bart的编码器和解码器端到端的阿拉伯语模型。我们表明,阿拉伯艺术品在多个抽象摘要数据集上取得了最佳性能,超过了强大的基准,包括基于阿拉伯语BERT的模型以及多语言MBART和MT5模型。

Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focused on English, Arabic remained understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART. We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model and multilingual mBART and mT5 models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源