论文标题
语言生成模型的自然偏见
A Natural Bias for Language Generation Models
论文作者
论文摘要
经过几百个培训更新后,语言产生的标准概率模型可能还没有学会许多自然语言的语义或句法规则,因此很难估算下一个代币的概率分布。然而,在这一点上,这些模型已经确定了一种简单,最少的损失行为:输出目标训练语料库的覆盖分布。这种启发式的使用提出了一个问题:我们可以使用这种行为初始化模型并节省宝贵的计算资源和模型能力?在这里,我们证明我们可以有效地将标准神经语言生成模型赋予一个单独的模块,该模块将其反映为先验知识的单独模块,只需在模型的最终线性层中以日志 - Unigram分布来初始化偏差项即可。我们使用神经机器翻译作为这种简单技术的测试床,并观察到它:(i)提高学习效率; (ii)取得更好的总体表现;也许最重要的是(iii)似乎通过鼓励模型专门研究语言的非频率相关方面,从而消除了强频效应。
After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens. Yet around this point, these models have identified a simple, loss-minimising behaviour: to output the unigram distribution of the target training corpus. The use of such a heuristic raises the question: Can we initialise our models with this behaviour and save precious compute resources and model capacity? Here we show that we can effectively endow standard neural language generation models with a separate module that reflects unigram frequency statistics as prior knowledge, simply by initialising the bias term in a model's final linear layer with the log-unigram distribution. We use neural machine translation as a test bed for this simple technique and observe that it: (i) improves learning efficiency; (ii) achieves better overall performance; and perhaps most importantly (iii) appears to disentangle strong frequency effects by encouraging the model to specialise in non-frequency-related aspects of language.