使用学术文章的全文内容来识别和评估自然语言处理领域中的算法实体

论文标题

使用学术文章的全文内容来识别和评估自然语言处理领域中的算法实体

Using the Full-text Content of Academic Articles to Identify and Evaluate Algorithm Entities in the Domain of Natural Language Processing

论文作者

Wang, Yuzhuo, Zhang, Chengzhi

论文摘要

在大数据时代，算法在学术研究中的进步，改进和应用在促进不同学科的发展方面发挥了重要作用。各种学科的学术论文，尤其是计算机科学，都包含大量算法。从论文的全文内容中识别算法可以确定特定领域中流行或经典的算法，并帮助学者们对算法甚至领域有全面的了解。为此，本文以自然语言处理（NLP）的领域为例，并确定了该领域学术论文的算法。算法词典是通过手动注释论文的内容来构建的，并且字典中包含算法的句子是通过基于字典的匹配提取的。提到算法的文章数量被用作分析该算法影响的指标。我们的结果揭示了在NLP论文中影响最大的算法，并表明分类算法是高影响力算法中最大的比例。此外，算法影响的演变反映了该领域的研究任务和主题的变化，不同算法的影响的变化显示出不同的趋势。作为初步探索，本文对学术文本中提到的算法的影响进行了分析，结果可以用作自动提取大型算法的培训数据。本文中的方法是独立于域的，可以应用于其他领域。

In the era of big data, the advancement, improvement, and application of algorithms in academic research have played an important role in promoting the development of different disciplines. Academic papers in various disciplines, especially computer science, contain a large number of algorithms. Identifying the algorithms from the full-text content of papers can determine popular or classical algorithms in a specific field and help scholars gain a comprehensive understanding of the algorithms and even the field. To this end, this article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field. A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching. The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm. Our results reveal the algorithm with the highest influence in NLP papers and show that classification algorithms represent the largest proportion among the high-impact algorithms. In addition, the evolution of the influence of algorithms reflects the changes in research tasks and topics in the field, and the changes in the influence of different algorithms show different trends. As a preliminary exploration, this paper conducts an analysis of the impact of algorithms mentioned in the academic text, and the results can be used as training data for the automatic extraction of large-scale algorithms in the future. The methodology in this paper is domain-independent and can be applied to other domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题