论文标题

关于单词嵌入中知识增强数据的影响

On the Effects of Knowledge-Augmented Data in Word Embeddings

论文作者

Ramirez-Echavarria, Diego, Bikakis, Antonis, Dickens, Luke, Miller, Rob, Vlachidis, Andreas

论文摘要

本文调查了从大量未经注销数据的大量语料库中学到的单词嵌入的知识注入技术。这些表示形式是通过单词共发生统计训练的,并且通常不会从语言知识库中利用句法和语义信息,这可能会将其可传递性限制为具有不同语言分布或用法的域。我们通过数据增强提出了一种新型的语言知识注入方法,以学习从数据实现语义关系的单词嵌入,并系统地评估其对结果表示的影响。我们展示了我们的知识增强方法改善了学习嵌入的固有特征,同时并没有显着改变其在下游文本分类任务上的结果。

This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源