论文标题
使用Mbert嵌入具有句法依赖性的Mbert嵌入的跨语言词感官歧义
Cross-lingual Word Sense Disambiguation using mBERT Embeddings with Syntactic Dependencies
论文作者
论文摘要
跨语性的单词感觉歧义(WSD)应对在给定上下文的语言上剥夺歧义单词的挑战。事实证明,预训练的BERT嵌入模型可有效地提取单词的上下文信息,并已作为功能纳入许多最新的WSD系统。为了研究如何将句法信息添加到BERT嵌入中,以导致语义 - 和语法中融合的单词嵌入,该项目提出了通过产生依赖性解析和编码单词的相对关系到输入嵌入中的依赖关系来提出的串联嵌入。还提出了两种方法来减少串联嵌入的大小。实验结果表明,与语法结合的嵌入的高维度构成了分类任务的障碍,在未来的研究中需要进一步解决。
Cross-lingual word sense disambiguation (WSD) tackles the challenge of disambiguating ambiguous words across languages given context. The pre-trained BERT embedding model has been proven to be effective in extracting contextual information of words, and have been incorporated as features into many state-of-the-art WSD systems. In order to investigate how syntactic information can be added into the BERT embeddings to result in both semantics- and syntax-incorporated word embeddings, this project proposes the concatenated embeddings by producing dependency parse tress and encoding the relative relationships of words into the input embeddings. Two methods are also proposed to reduce the size of the concatenated embeddings. The experimental results show that the high dimensionality of the syntax-incorporated embeddings constitute an obstacle for the classification task, which needs to be further addressed in future studies.