两全其美：将学习的嵌入与工程功能相结合，以准确预测正确的补丁

论文标题

两全其美：将学习的嵌入与工程功能相结合，以准确预测正确的补丁

The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches

论文作者

Tian, Haoye, Liu, Kui, Li, Yinghua, Kaboré, Abdoul Kader, Koyuncu, Anil, Habib, Andrew, Li, Li, Wen, Junhao, Klein, Jacques, Bissyandé, Tegawendé F.

论文摘要

关于自动化程序维修的大量文献开发了一种方法，在这些方法中，将自动生成贴片以针对甲骨文进行验证（例如，测试套件）。因为这样的甲骨文可能不完美，所以生成的补丁虽然由甲骨文验证，但实际上可能是不正确的。我们的经验工作调查了不同表示方法，以进行代码更改以得出嵌入的嵌入方式，这些嵌入可以与补丁正确性识别的相似性计算相似，并通过将学习的嵌入方式与工程特征相结合，以评估正确分类正确的贴片的可能性。 Experimental results demonstrate the potential of learned embeddings to empower Leopard (a patch correctness predicting framework implemented in this work) with learning algorithms in reasoning about patch correctness: a machine learning predictor with BERT transformer-based learned embeddings associated with XGBoost achieves an AUC value of about 0.803 in the prediction of patch correctness on a new dataset of 2,147 labeled patches that we collected for the实验。我们的调查表明，在与最先进的Patch-Sim进行比较时，深入学习的嵌入可能会导致互补/更好的性能，这依赖于动态信息。通过结合深度学习的嵌入和工程功能，Panther（这项工作中实施的豹子的升级版本）在AUC， +Recell and -recall方面的得分较高，并且可以准确地识别出仅通过学识渊博的嵌入或工程的功能来预测的（IN）无法预测的正确（IN）正确的补丁。最后，我们使用可解释的ML技术Shap来从经验上解释如何为贴片正确性预测做出有助于所学的嵌入和工程特征。

A large body of the literature on automated program repair develops approaches where patches are automatically generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although validated by the oracle, may actually be incorrect. Our empirical work investigates different representation learning approaches for code changes to derive embeddings that are amenable to similarity computations of patch correctness identification, and assess the possibility of accurate classification of correct patch by combining learned embeddings with engineered features. Experimental results demonstrate the potential of learned embeddings to empower Leopard (a patch correctness predicting framework implemented in this work) with learning algorithms in reasoning about patch correctness: a machine learning predictor with BERT transformer-based learned embeddings associated with XGBoost achieves an AUC value of about 0.803 in the prediction of patch correctness on a new dataset of 2,147 labeled patches that we collected for the experiments. Our investigations show that deep learned embeddings can lead to complementary/better performance when comparing against the state-of-the-art, PATCH-SIM, which relies on dynamic information. By combining deep learned embeddings and engineered features, Panther (the upgraded version of Leopard implemented in this work) outperforms Leopard with higher scores in terms of AUC, +Recall and -Recall, and can accurately identify more (in)correct patches that cannot be predicted by the classifiers only with learned embeddings or engineered features. Finally, we use an explainable ML technique, SHAP, to empirically interpret how the learned embeddings and engineered features are contributed to the patch correctness prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题