论文标题
通过质疑相关性正则化的代码检索的对抗培训
Adversarial Training for Code Retrieval with Question-Description Relevance Regularization
论文作者
论文摘要
代码检索是旨在匹配自然和编程语言的关键任务。在这项工作中,我们建议对守则检索的对抗性学习,该学习是通过询问描述相关性正规的。首先,我们调整了一种简单的对抗性学习技术,可以在给定输入问题的情况下生成困难的代码段,这可以帮助学习面对双模式和数据施加挑战的代码检索。其次,我们建议利用问题描述的相关性来正规化对抗性学习,以便只有当预计其配对的自然语言描述与用户给出的问题相关时,生成的代码片段应为代码检索培训损失做出更大的贡献。两种编程语言的大规模代码检索数据集的实验表明,我们的对抗性学习方法能够改善最新模型的性能。此外,使用其他重复的问题预测模型来正规化对抗性学习进一步提高了表现,这比在强大的多任务学习基准中使用重复的问题更有效
Code retrieval is a key task aiming to match natural and programming languages. In this work, we propose adversarial learning for code retrieval, that is regularized by question-description relevance. First, we adapt a simple adversarial learning technique to generate difficult code snippets given the input question, which can help the learning of code retrieval that faces bi-modal and data-scarce challenges. Second, we propose to leverage question-description relevance to regularize adversarial learning, such that a generated code snippet should contribute more to the code retrieval training loss, only if its paired natural language description is predicted to be less relevant to the user given question. Experiments on large-scale code retrieval datasets of two programming languages show that our adversarial learning method is able to improve the performance of state-of-the-art models. Moreover, using an additional duplicate question prediction model to regularize adversarial learning further improves the performance, and this is more effective than using the duplicated questions in strong multi-task learning baselines