关于训练注意模型的动态

论文标题

关于训练注意模型的动态

On the Dynamics of Training Attention Models

论文作者

Lu, Haoye, Mao, Yongyi, Nayak, Amiya

论文摘要

注意机制已被广泛用于深神经网络作为模型组成部分。到目前为止，它已成为许多最先进的自然语言模型中的关键构件。尽管取得了巨大的成功，但迄今为止尚未对注意力的工作机制进行研究。在本文中，我们设置了一个简单的文本分类任务，并研究了使用梯度下降的简单基于注意力的分类模型的动态。在这种情况下，我们表明，对于该模型应该关注的歧视性词语，存在一个持久的身份，与其嵌入以及其密钥和查询的内部产物有关。这使我们能够证明，当注意力输出由线性分类器分类时，培训必须收敛到歧视性词。进行实验，以验证我们的理论分析并提供进一步的见解。

The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validate our theoretical analysis and provide further insights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题