论文标题

自我发挥的网络如何识别Dyck-N语言?

How Can Self-Attention Networks Recognize Dyck-n Languages?

论文作者

Ebrahimi, Javid, Gelda, Dhruv, Zhang, Wei

论文摘要

我们专注于具有自我注意力(SA)网络的Dyck-N($ \ Mathcal {D} _n $)语言的识别,这被认为是这些网络的艰巨任务。我们比较了两个SA变体的性能,一个具有起始符号(SA $^+$),一个没有(SA $^ - $)。我们的结果表明,sa $^+$能够概括为更长的序列和更深的依赖项。对于$ \ Mathcal {D} _2 $,我们发现SA $^ - $完全分解了长序列,而SA $^+$的准确性为58.82 $ \%$。我们发现通过$ \ text {sa} {^+} $学到的注意力图可与解释相吻合,并与基于堆栈的语言识别器兼容。令人惊讶的是,SA网络的性能与LSTM相当,该网络提供了有关SA在不递归而学习层次结构的能力的证据。

We focus on the recognition of Dyck-n ($\mathcal{D}_n$) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA$^+$) and one without (SA$^-$). Our results show that SA$^+$ is able to generalize to longer sequences and deeper dependencies. For $\mathcal{D}_2$, we find that SA$^-$ completely breaks down on long sequences whereas the accuracy of SA$^+$ is 58.82$\%$. We find attention maps learned by $\text{SA}{^+}$ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源