Codequeries：代码上语义查询的数据集

论文标题

Codequeries：代码上语义查询的数据集

CodeQueries: A Dataset of Semantic Queries over Code

论文作者

Sahu, Surya Prakash, Mandal, Madhurima, Bharadwaj, Shikhar, Kanade, Aditya, Maniatis, Petros, Shevade, Shirish

论文摘要

开发人员通常会对他们正在处理的代码的语义方面有疑问，例如，“有没有父母级别的类别属性属性的类？”。回答它们需要理解代码语义，例如属性和类的继承关系。这样一个问题的答案应确定构成答案的代码跨度（例如，子类声明）以及支持事实（例如，冲突属性的定义）。关于避开代码的问题的现有工作考虑了是/否问题或方法级别的上下文。我们在Python代码上贡献了一个名为“ Codequeries”标签的数据集。与现有数据集相比，在Codequeries中，查询是关于代码语义的，上下文是文件级别，答案是代码跨度。我们根据广泛使用的静态分析工具（CodeQL）支持的查询来策划数据集，并包含正面和负面示例，以及需要单跳和多跳上推理的查询。为了评估数据集的价值，我们评估了基线神经方法。我们在零射门中研究了大型语言模型（GPT3.5-Turbo），并在一部分的Codequeries中研究了几乎没有射击的设置。我们还通过微调评估了BERT风格模型（Cubert）。我们发现这些模型在Codequeries上取得了有限的成功。因此，在提取性问题的设置中，Codequeries是测试神经模型，了解代码语义的能力的挑战性数据集。

Developers often have questions about semantic aspects of code they are working on, e.g., "Is there a class whose parent classes declare a conflicting attribute?". Answering them requires understanding code semantics such as attributes and inheritance relation of classes. An answer to such a question should identify code spans constituting the answer (e.g., the declaration of the subclass) as well as supporting facts (e.g., the definitions of the conflicting attributes). The existing work on question-answering over code has considered yes/no questions or method-level context. We contribute a labeled dataset, called CodeQueries, of semantic queries over Python code. Compared to the existing datasets, in CodeQueries, the queries are about code semantics, the context is file level and the answers are code spans. We curate the dataset based on queries supported by a widely-used static analysis tool, CodeQL, and include both positive and negative examples, and queries requiring single-hop and multi-hop reasoning. To assess the value of our dataset, we evaluate baseline neural approaches. We study a large language model (GPT3.5-Turbo) in zero-shot and few-shot settings on a subset of CodeQueries. We also evaluate a BERT style model (CuBERT) with fine-tuning. We find that these models achieve limited success on CodeQueries. CodeQueries is thus a challenging dataset to test the ability of neural models, to understand code semantics, in the extractive question-answering setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题