论文标题

特征诺斯:在温暖启动之前的对比度

EigenNoise: A Contrastive Prior to Warm-Start Representations

论文作者

Heidenreich, Hunter Scott, Williams, Jake Ryland

论文摘要

在这项工作中,我们基于密集,独立的共发生模型为单词向量提供了一种幼稚的初始化方案,并提供了初步结果,表明它具有竞争力并需要进一步研究。具体而言,我们通过信息理论的最小描述长度(MDL)探测,尽管缺乏任何预训练数据(在特征性的情况下),但我们的模型尤吉诺斯仍可以接近经验训练的手套的性能。我们介绍了这些初步结果,并引起了人们的兴趣,以为在没有预培训数据的情况下进一步研究该竞争初始化的工作方式,并邀请探索由谐语语言结构理论所告知的更智能的初始化方案。我们对这一理论的应用同样,对最近发现的新颖(有效)解释,该发现阐明了语言表示从数据和对比分布中捕获的潜在分布信息。

In this work, we present a naive initialization scheme for word vectors based on a dense, independent co-occurrence model and provide preliminary results that suggest it is competitive and warrants further investigation. Specifically, we demonstrate through information-theoretic minimum description length (MDL) probing that our model, EigenNoise, can approach the performance of empirically trained GloVe despite the lack of any pre-training data (in the case of EigenNoise). We present these preliminary results with interest to set the stage for further investigations into how this competitive initialization works without pre-training data, as well as to invite the exploration of more intelligent initialization schemes informed by the theory of harmonic linguistic structure. Our application of this theory likewise contributes a novel (and effective) interpretation of recent discoveries which have elucidated the underlying distributional information that linguistic representations capture from data and contrast distributions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源