基于命名实体识别的异质星图的类型监督序列标记

论文标题

基于命名实体识别的异质星图的类型监督序列标记

Type-supervised sequence labeling based on the heterogeneous star graph for named entity recognition

论文作者

Wen, Xueru, Zhou, Changjiang, Tang, Haotian, Liang, Luguang, Jiang, Yu, Qi, Hong

论文摘要

命名实体识别是自然语言处理中的一项基本任务，可以识别非结构化文本中实体的跨度和类别。传统的序列标签方法忽略了嵌套实体，即其他实体中包含的实体。许多方法试图解决这种情况，其中大多数依赖于复杂的结构或具有很高的计算复杂性。本文研究了包含文本节点和类型节点的异质星图的表示。此外，我们将图形注意机制修改为一种混合形式，以解决其在特定拓扑中的不合理性。该模型在更新图表中更新节点后执行类型监督的序列标记。注释方案是单层序列标记的扩展，并且能够应对绝大多数嵌套实体。在公共NER数据集上进行的广泛实验揭示了我们模型在提取平坦和嵌套实体方面的有效性。该方法在平坦和嵌套数据集上都实现了最先进的性能。准确性的显着提高反映了多层标签策略的优势。

Named entity recognition is a fundamental task in natural language processing, identifying the span and category of entities in unstructured texts. The traditional sequence labeling methodology ignores the nested entities, i.e. entities included in other entity mentions. Many approaches attempt to address this scenario, most of which rely on complex structures or have high computation complexity. The representation learning of the heterogeneous star graph containing text nodes and type nodes is investigated in this paper. In addition, we revise the graph attention mechanism into a hybrid form to address its unreasonableness in specific topologies. The model performs the type-supervised sequence labeling after updating nodes in the graph. The annotation scheme is an extension of the single-layer sequence labeling and is able to cope with the vast majority of nested entities. Extensive experiments on public NER datasets reveal the effectiveness of our model in extracting both flat and nested entities. The method achieved state-of-the-art performance on both flat and nested datasets. The significant improvement in accuracy reflects the superiority of the multi-layer labeling strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题