论文标题

单词感官歧义的拓扑数据分析

Topological Data Analysis for Word Sense Disambiguation

论文作者

Rawson, Michael, Dooley, Samuel, Bharadwaj, Mithun, Choudhary, Rishabh

论文摘要

我们开发并测试了一种使用拓扑数据分析的单词感知感应和歧义的新型无监督算法。该问题的典型方法涉及聚类,基于单词嵌入中距离的简单低级特征。我们的方法依赖于拓扑领域的高级数学概念,该拓扑领域为“感官诱导”任务提供了更丰富的群集概念化。我们在SEMCOR数据集上使用持续的同源条形码算法,并证明我们的方法在单词sense诱导上给出了较低的相对误差。这显示了拓扑算法对自然语言处理的希望,我们主张在这个有前途的领域进行未来的工作。

We develop and test a novel unsupervised algorithm for word sense induction and disambiguation which uses topological data analysis. Typical approaches to the problem involve clustering, based on simple low level features of distance in word embeddings. Our approach relies on advanced mathematical concepts in the field of topology which provides a richer conceptualization of clusters for the word sense induction tasks. We use a persistent homology barcode algorithm on the SemCor dataset and demonstrate that our approach gives low relative error on word sense induction. This shows the promise of topological algorithms for natural language processing and we advocate for future work in this promising area.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源