论文标题

使用POS标记和增强的语义意识提高自动关键字提取(AKE)方法的性能

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

论文作者

Altuncu, Enes, Nurse, Jason R. C., Xu, Yang, Guo, Jie, Li, Shujun

论文摘要

随着现代计算系统处理的数字文本数据的增加,自动关键字提取(AKE)已变得更加重要。它在信息检索(IR)和自然语言处理(NLP)中具有各种应用,包括文本摘要,主题分析和文档索引。本文提出了一种简单但有效的基于后处理的通用方法,可通过增强的语义意识水平,以提高AKE方法的性能,并通过POS-Tagging支持。为了证明所提出的方法的性能,我们考虑了从POS标记步骤中检索到的单词类型和两个代表性的语义信息来源 - 一个或多个与上下文相关的词库中定义的专业术语,并在Wikipedia中命名实体。作为后处理器的一部分,可以简单地将上述三个步骤添加到任何AKE方法的末尾,该方法只需按照某些特定于上下文和语义意识的标准重新评估所有候选关键字。对于五种最先进的AKE方法,我们使用17个选定数据集的实验结果表明,拟议的方法始终如一地提高了其性能(在改善情况方面最高100%)和显着的(10.2%和53.8%,介于10.2%和53.8%之间,平均为25.8%,在F1评分方面平均为25.8%,在所有五个方法方面,尤其是在所有三个方法中),尤其是三个增强的步骤。考虑到将我们提出的方法应用于任何AKE方法并进一步扩展的,我们的结果具有深远的含义。

Automatic keyword extraction (AKE) has gained more importance with the increasing amount of digital textual data that modern computing systems process. It has various applications in information retrieval (IR) and natural language processing (NLP), including text summarisation, topic analysis and document indexing. This paper proposes a simple but effective post-processing-based universal approach to improve the performance of any AKE methods, via an enhanced level of semantic-awareness supported by PoS-tagging. To demonstrate the performance of the proposed approach, we considered word types retrieved from a PoS-tagging step and two representative sources of semantic information - specialised terms defined in one or more context-dependent thesauri, and named entities in Wikipedia. The above three steps can be simply added to the end of any AKE methods as part of a post-processor, which simply re-evaluate all candidate keywords following some context-specific and semantic-aware criteria. For five state-of-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100% in terms of improved cases) and significantly (between 10.2% and 53.8%, with an average of 25.8%, in terms of F1-score and across all five methods), especially when all the three enhancement steps are used. Our results have profound implications considering the ease to apply our proposed approach to any AKE methods and to further extend it.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源