论文标题

CSRS:具有相关性匹配和语义匹配的代码搜索

CSRS: Code Search with Relevance Matching and Semantic Matching

论文作者

Cheng, Yi, Kuang, Li

论文摘要

开发人员在软件开发过程中经常搜索和重复使用现有代码段。代码搜索旨在根据开发人员输入的自然语言查询从代码库中检索相关代码段。到目前为止,研究人员已经提出了基于信息检索(IR)的方法和基于深度学习的方法(DL)方法。基于IR的方法着重于关键字匹配,即通过查询和代码段之间的相关性对代码进行排名,而基于DL的方法则集中于捕获语义相关性。但是,现有方法不考虑同时捕获两个匹配信号。因此,在本文中,我们提出了CSRS,即具有相关性匹配和语义匹配的代码搜索模型。 CSR包括(1)包含具有不同尺寸的卷积内核的嵌入模块,这些杂音可以提取查询和代码的n-gram嵌入,((2)一个相关性匹配模块,可测量词汇匹配信号,以及(3)基于共集的基于基于基于的语义匹配模块以捕获语义相关。我们在具有18.22m和10k代码段的数据集上训练和评估CSR。实验结果表明,CSRS的MRR为0.614,该MRR的表现分别优于两种最先进的DEEPC和CARLCS-CNN,分别为33.77%和18.53%。此外,我们还进行了几项实验,以证明CSR的每个组成部分的有效性。

Developers often search and reuse existing code snippets in the process of software development. Code search aims to retrieve relevant code snippets from a codebase according to natural language queries entered by the developer. Up to now, researchers have already proposed information retrieval (IR) based methods and deep learning (DL) based methods. The IR-based methods focus on keyword matching, that is to rank codes by relevance between queries and code snippets, while DL-based methods focus on capturing the semantic correlations. However, the existing methods do not consider capturing two matching signals simultaneously. Therefore, in this paper, we propose CSRS, a code search model with relevance matching and semantic matching. CSRS comprises (1) an embedding module containing convolution kernels of different sizes which can extract n-gram embeddings of queries and codes, (2) a relevance matching module that measures lexical matching signals, and (3) a co-attention based semantic matching module to capture the semantic correlation. We train and evaluate CSRS on a dataset with 18.22M and 10k code snippets. The experimental results demonstrate that CSRS achieves an MRR of 0.614, which outperforms two state-of-the-art models DeepCS and CARLCS-CNN by 33.77% and 18.53% respectively. In addition, we also conducted several experiments to prove the effectiveness of each component of CSRS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源