科学命名实体识别的分层变压器模型

论文标题

科学命名实体识别的分层变压器模型

Hierarchical Transformer Model for Scientific Named Entity Recognition

论文作者

Zaratiana, Urchade, Holat, Pierre, Tomeh, Nadi, Charnois, Thierry

论文摘要

指定实体识别（NER）的任务是许多自然语言处理系统的重要组成部分，例如关系提取和知识图构造。在这项工作中，我们为指定实体识别提供了一种简单有效的方法。我们方法的主要思想是用预训练的变压器（例如bert）编码输入子字序列，然后，将另一层变压器添加到子字表示中，以更好地编码单词级交互。我们在三个基准数据集上评估了我们的科学NER的方法，特别是在计算机科学和生物医学领域中。实验结果表明，我们的模型在不需要外部资源或特定数据增强的情况下优于SCIERC和TDM数据集的当前最新技术。代码可在\ url {https://github.com/urchade/hner}上找到。

The task of Named Entity Recognition (NER) is an important component of many natural language processing systems, such as relation extraction and knowledge graph construction. In this work, we present a simple and effective approach for Named Entity Recognition. The main idea of our approach is to encode the input subword sequence with a pre-trained transformer such as BERT, and then, instead of directly classifying the word labels, another layer of transformer is added to the subword representation to better encode the word-level interaction. We evaluate our approach on three benchmark datasets for scientific NER, particularly in the computer science and biomedical domains. Experimental results show that our model outperforms the current state-of-the-art on SciERC and TDM datasets without requiring external resources or specific data augmentation. Code is available at \url{https://github.com/urchade/HNER}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题