论文标题
了解长期的编程语言,以结构意识到稀疏的关注
Understanding Long Programming Languages with Structure-Aware Sparse Attention
论文作者
论文摘要
基于编程的预训练的语言模型(PPLM)(例如Codebert)在许多下游代码相关的任务中取得了巨大的成功。由于变压器中自我注意力的记忆和计算复杂性随序列长度四次增长,因此PPLM通常将代码长度限制在512中。但是,现实世界应用中的代码通常很长,例如代码搜索,这些代码搜索无法通过现有PPLM进行有效处理。为了解决这个问题,在本文中,我们提出了SASA,这是一种稀疏的注意机制,可降低复杂性并改善长期理解任务的性能。 SASA中的关键组成部分是$ K $稀疏的关注和抽象的语法树(AST)基于结构意识的关注。有了顶级$ K $稀疏的关注,可以通过较低的计算成本获得最关键的关注关系。由于代码结构代表代码语句的逻辑,这是代码序列特征的补充,我们将AST结构进一步引入注意力。关于Codexglue任务的广泛实验表明,SASA比竞争基线取得更好的性能。
Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically with the sequence length, PPLMs typically limit the code length to 512. However, codes in real-world applications are generally long, such as code searches, which cannot be processed efficiently by existing PPLMs. To solve this problem, in this paper, we present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks. The key components in SASA are top-$k$ sparse attention and Abstract Syntax Tree (AST)-based structure-aware attention. With top-$k$ sparse attention, the most crucial attention relation can be obtained with a lower computational cost. As the code structure represents the logic of the code statements, which is a complement to the code sequence characteristics, we further introduce AST structures into attention. Extensive experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.