论文标题

在明确的语义分析中引入Wikipedia文章之间的相关性

Introducing Inter-Relatedness between Wikipedia Articles in Explicit Semantic Analysis

论文作者

Elango, Naveen, K, Pawan Prasad

论文摘要

显式语义分析(ESA)是一种用于将文本表示为概念空间中的向量的技术,例如Wikipedia中发现的文章。我们提出了一种方法,将Wikipedia文章之间的相互关系的知识与从ESA获得的向量相关性,并使用一种称为翻新的技术来改善使用ESA形成向量嵌入的后续任务的性能。尤其是我们使用一个无方向的图来代表这些知识,将节点作为文章和边缘作为两篇文章之间的关系。在这里,我们还强调了如何将ESA步骤视为一种使用语料库来提出矢量表示的主要自下而上的方法,并纳入了自上而下的知识,这是文章之间的关系以进一步改善它。我们在Wikipedia语料库的几个较小子集上检验了我们的假设,并表明我们提出的方法可以改善绩效指标,包括大多数情况下,包括Spearman的等级相关系数。

Explicit Semantic Analysis (ESA) is a technique used to represent a piece of text as a vector in the space of concepts, such as Articles found in Wikipedia. We propose a methodology to incorporate knowledge of Inter-relatedness between Wikipedia Articles to the vectors obtained from ESA using a technique called Retrofitting to improve the performance of subsequent tasks that use ESA to form vector embeddings. Especially we use an undirected Graph to represent this knowledge with nodes as Articles and edges as inter relations between two Articles. Here, we also emphasize how the ESA step could be seen as a predominantly bottom-up approach using a corpus to come up with vector representations and the incorporation of top-down knowledge which is the relations between Articles to further improve it. We test our hypothesis on several smaller subsets of the Wikipedia corpus and show that our proposed methodology leads to decent improvements in performance measures including Spearman's Rank correlation coefficient in most cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源