论文标题
将键形提取和词汇多样性结合在一起,以表征出版物标题中的想法
Combining keyphrase extraction and lexical diversity to characterize ideas in publication titles
论文作者
论文摘要
除了文献计量学之外,还有兴趣表征科学论文中思想数量的演变。调查此问题的一种常见方法涉及分析出版物的标题,以检测随着时间的推移词汇变化。以这样的概念认为短语或更具体的钥匙源代表概念,将词汇多样性指标应用于标题的短语版本。因此,词汇多样性的变化被视为研究的指标,甚至可能是扩展的研究。因此,优化键形检测是该过程的重要方面。我们建议使用多个短语检测模型,而不是仅仅一个目标,以便从源代码语料库中生产出更全面的钥匙串。这种方法的另一个潜在优势是,这些集合的联合和差异可能会提供自动化技术来识别和省略非特异性短语。我们比较了几个短语检测模型的性能,分析每个短语集的输出集,并使用四个常见的词汇多样性指标计算包含每个模型的键形键形的Corpora变体的词汇多样性。
Beyond bibliometrics, there is interest in characterizing the evolution of the number of ideas in scientific papers. A common approach for investigating this involves analyzing the titles of publications to detect vocabulary changes over time. With the notion that phrases, or more specifically keyphrases, represent concepts, lexical diversity metrics are applied to phrased versions of the titles. Thus changes in lexical diversity are treated as indicators of shifts, and possibly expansion, of research. Therefore, optimizing detection of keyphrases is an important aspect of this process. Rather than just one, we propose to use multiple phrase detection models with the goal to produce a more comprehensive set of keyphrases from the source corpora. Another potential advantage to this approach is that the union and difference of these sets may provide automated techniques for identifying and omitting non-specific phrases. We compare the performance of several phrase detection models, analyze the keyphrase sets output of each, and calculate lexical diversity of corpora variants incorporating keyphrases from each model, using four common lexical diversity metrics.