论文标题

通过封闭状态空间进行远程语言建模

Long Range Language Modeling via Gated State Spaces

论文作者

Mehta, Harsh, Gupta, Ankit, Cutkosky, Ashok, Neyshabur, Behnam

论文摘要

状态空间模型已显示在建模远距离依赖性方面有效,特别是在序列分类任务上。在这项工作中,我们着重于对英语书籍,GitHub源代码和Arxiv数学文章的自回旋序列建模。基于围绕封闭激活功能的有效性的最新发展,我们提出了一个名为Gated State Space(GSS)的新层,并表明它的训练速度明显快于TPU上的S4(即DSS)的对角线版本,它的竞争性相当竞争,与几个良好的基于​​良好的基于​​变形金刚的基础线,并展现出零点的总体化,以实现零点的总体化,以实施零点。最后,我们表明,利用自我注意力建模局部依赖性可以进一步提高GSS的性能。

State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is fairly competitive with several well-tuned Transformer-based baselines and exhibits zero-shot generalization to longer inputs while being straightforward to implement. Finally, we show that leveraging self-attention to model local dependencies improves the performance of GSS even further.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源