论文标题

跨界:评估命名实体识别的跨域

CrossNER: Evaluating Cross-Domain Named Entity Recognition

论文作者

Liu, Zihan, Xu, Yan, Yu, Tiezheng, Dai, Wenliang, Ji, Ziwei, Cahyawijaya, Samuel, Madotto, Andrea, Fung, Pascale

论文摘要

跨域命名实体识别(NER)模型能够应对目标域中NER样本的稀缺问题。但是,大多数现有的NER基准都缺乏域特有的实体类型或不关注某个域,从而导致跨域评估较低。为了解决这些障碍,我们引入了一个跨域NER数据集(Crossner),该数据集(Crossner)是一个标记的NER数据集合,该数据跨越了五个不同域的五个不同域中,具有专门的实体类别。此外,我们还提供了与域相关的语料库,因为它使用它来继续培训的语言模型(域自适应预训练)对域的适应性有效。然后,我们进行全面的实验,以探讨利用不同水平的领域语料库和预训练策略的有效性,以进行跨域任务进行域自适应预训练。结果表明,专注于包含领域特殊实体的分数语料库,并利用在域适应性的预训练中采用更具挑战性的预训练策略对NER域的适应性有益,并且我们提出的方法可以一致地胜过现有的交叉域内域。然而,实验还说明了这项跨域NER任务的挑战。我们希望我们的数据集和基线能够催化NER域适应区域的研究。代码和数据可在https://github.com/zliucr/crossner上获得。

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源