论文标题

令牌变压器:类令牌可以帮助基于窗口的变压器建立更好的远程交互?

Token Transformer: Can class token help window-based transformer build better long-range interactions?

论文作者

Mao, Jiawei, Chang, Yuanqi, Yin, Xuesong

论文摘要

与香草变压器相比,基于窗口的变压器在准确性和效率之间提供了更好的权衡。尽管基于窗口的变压器取得了长足的进步,但由于本地窗口和窗口连接方案的大小,其远程建模功能受到限制。为了解决这个问题,我们提出了一个新颖的令牌变压器(TT)。 TT的核心机制是添加类(CLS)令牌,以汇总每个本地窗口中的窗口信息。我们将这种类型的令牌相互作用称为CLS的注意。这些CLS令牌将与每个窗口中的令牌在空间上进行交互,以启用远程建模。为了保留基于窗口的变压器的层次设计,我们在TT的每个阶段设计了特征继承模块(FIM),以在下一阶段传递从上一个阶段到CLS令牌的本地窗口信息。此外,我们在TT中设计了一个空间通道前馈网络(SCFFN),该网络可以将CLS令牌和嵌入代币在空间域和通道域上嵌入,而无需其他参数。广泛的实验表明,我们的TT在图像分类和下游任务中具有低参数可实现竞争结果。

Compared with the vanilla transformer, the window-based transformer offers a better trade-off between accuracy and efficiency. Although the window-based transformer has made great progress, its long-range modeling capabilities are limited due to the size of the local window and the window connection scheme. To address this problem, we propose a novel Token Transformer (TT). The core mechanism of TT is the addition of a Class (CLS) token for summarizing window information in each local window. We refer to this type of token interaction as CLS Attention. These CLS tokens will interact spatially with the tokens in each window to enable long-range modeling. In order to preserve the hierarchical design of the window-based transformer, we designed Feature Inheritance Module (FIM) in each phase of TT to deliver the local window information from the previous phase to the CLS token in the next phase. In addition, we have designed a Spatial-Channel Feedforward Network (SCFFN) in TT, which can mix CLS tokens and embedded tokens on the spatial domain and channel domain without additional parameters. Extensive experiments have shown that our TT achieves competitive results with low parameters in image classification and downstream tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源