精制门：复发单元的简单有效的门控机制

论文标题

精制门：复发单元的简单有效的门控机制

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

论文作者

Cheng, Zhanzhan, Xu, Yunlu, Cheng, Mingjian, Qiao, Yu, Pu, Shiliang, Niu, Yi, Wu, Fei

论文摘要

复发性神经网络（RNN）已通过序列学习任务进行了广泛的研究，而主流模型（例如LSTM和GRU）依赖于门控机制（控制信息如何在隐藏状态之间流动）。但是，RNN中的香草门（例如，LSTM中的输入门）遭受了训练的问题的问题，这可能是由各种因素引起的，例如各种因素，例如饱和激活功能，门布局（例如，门数和门控功能），甚至是次级次数的内存状态等。在本文中，我们提出了一般封闭式复发性神经网络中的一种新的门控机制，以解决此问题。具体而言，提议的门直接将提取的输入特征连接到称为精制门的香草门的输出。精炼机制允许增强梯度后传播以及扩展门控激活范围，这可以引导RNN达到可能更深的最小值。我们在包括LSTM，GRU和MGU在内的三种流行的封闭式RNN类型上验证了拟议的门控机制。对3个综合任务，3个语言建模任务和5个场景文本识别基准的大量实验证明了我们方法的有效性。

Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states). However, the vanilla gates in RNN (e.g., the input gate in LSTM) suffer from the problem of gate undertraining, which can be caused by various factors, such as the saturating activation functions, the gate layouts (e.g., the gate number and gating functions), or even the suboptimal memory state etc.. Those may result in failures of learning gating switch roles and thus the weak performance. In this paper, we propose a new gating mechanism within general gated recurrent neural networks to handle this issue. Specifically, the proposed gates directly short connect the extracted input features to the outputs of vanilla gates, denoted as refined gates. The refining mechanism allows enhancing gradient back-propagation as well as extending the gating activation scope, which can guide RNN to reach possibly deeper minima. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU. Extensive experiments on 3 synthetic tasks, 3 language modeling tasks and 5 scene text recognition benchmarks demonstrate the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题