论文标题
石膏:学习代码摘要的混合表示形式
GypSum: Learning Hybrid Representations for Code Summarization
论文作者
论文摘要
近年来,对深度学习的代码摘要进行了广泛的研究。当前的代码摘要的深度学习模型通常遵循神经机器翻译中的原理,并采用编码器框架,在该框架中,编码器从源代码中学习了语义表示,解码器将学习的表示形式转换为可读的人类可读文本,以描述代码snippets的功能。尽管他们达到了新的最新性能,但我们注意到当前的模型通常会产生流利的摘要,或者无法捕获核心功能,因为它们通常集中在单一类型的代码表示上。因此,我们提出了石膏,这是一种新的深度学习模型,使用图形注意神经网络和预先训练的编程和自然语言模型来学习混合表示。我们将与代码段的控制流有关的特定边缘介绍到用于图形构造的抽象语法树中,并设计两个编码器,分别从图形和源代码的令牌序列中学习。我们修改了变压器解码器中的编码器decoder sublayer,以融合表示形式,并提出一种双拷贝机制,以促进汇总生成。实验结果表明,石膏比现有代码摘要模型的出色性能。
Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.