通过封闭状态空间进行远程语言建模

论文标题

通过封闭状态空间进行远程语言建模

Long Range Language Modeling via Gated State Spaces

论文作者

Mehta, Harsh, Gupta, Ankit, Cutkosky, Ashok, Neyshabur, Behnam

论文摘要

状态空间模型已显示在建模远距离依赖性方面有效，特别是在序列分类任务上。在这项工作中，我们着重于对英语书籍，GitHub源代码和Arxiv数学文章的自回旋序列建模。基于围绕封闭激活功能的有效性的最新发展，我们提出了一个名为Gated State Space（GSS）的新层，并表明它的训练速度明显快于TPU上的S4（即DSS）的对角线版本，它的竞争性相当竞争，与几个良好的基于良好的基于变形金刚的基础线，并展现出零点的总体化，以实现零点的总体化，以实施零点。最后，我们表明，利用自我注意力建模局部依赖性可以进一步提高GSS的性能。

State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is fairly competitive with several well-tuned Transformer-based baselines and exhibits zero-shot generalization to longer inputs while being straightforward to implement. Finally, we show that leveraging self-attention to model local dependencies improves the performance of GSS even further.

下载PDF全文

下载文献需遵守相关版权规定

论文标题