Aralegal-Bert：阿拉伯法律文本的验证语言模型

论文标题

Aralegal-Bert：阿拉伯法律文本的验证语言模型

AraLegal-BERT: A pretrained language model for Arabic Legal text

论文作者

AL-Qurishi, Muhammad, AlQaseemi, Sarah, Soussi, Riad

论文摘要

BERT模型对多个语言任务的有效性已得到充分记录。另一方面，尚未充分探索其对狭窄和特定领域（例如法律）的潜力。在本文中，我们研究了如何在阿拉伯法律领域中使用BERT，并尝试使用多种与域相关的培训和测试数据集自定义此语言模型，以从SCRATCH培训BERT。我们介绍了Aralegal-Bert，这是一种基于双向编码器变压器的模型，该模型已经过彻底测试和仔细优化，以扩大NLP驱动解决方案对法学，法律文件和法律实践的影响。我们对Aralegal-Bert进行了微调，并根据三种自然语言理解（NLU）任务对阿拉伯语的三种BERT变化进行了评估。结果表明，与法律文本相比，Aralegal-Bert的基本版本比一般和原始BERT获得了更好的准确性。

The effectiveness of the BERT model on multiple linguistic tasks has been well documented. On the other hand, its potentials for narrow and specific domains such as Legal, have not been fully explored. In this paper, we examine how BERT can be used in the Arabic legal domain and try customizing this language model for several downstream tasks using several different domain-relevant training and testing datasets to train BERT from scratch. We introduce the AraLegal-BERT, a bidirectional encoder Transformer-based model that have been thoroughly tested and carefully optimized with the goal to amplify the impact of NLP-driven solution concerning jurisprudence, legal documents, and legal practice. We fine-tuned AraLegal-BERT and evaluated it against three BERT variations for Arabic language in three natural languages understanding (NLU) tasks. The results show that the base version of AraLegal-BERT achieve better accuracy than the general and original BERT over the Legal text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题