论文标题
学习方法对软件缺陷主持性预测的无数影响
The Untold Impact of Learning Approaches on Software Fault-Proneness Predictions
论文作者
论文摘要
软件错误预测预测是一个活跃的研究领域,许多因素影响了预测性能。但是,除了一项初始工作外,尚未研究学习方法(即用于培训和预测目标变量的数据的细节)对预测性能的影响。本文探讨了两种学习方法的影响,即Useallpredictall和usePredictPost,对软件错误预测预测的性能,包括释放内部和跨释放。经验结果基于从十二个开源项目的64个版本中提取的数据。结果表明,学习方法对分类绩效的影响有很大,通常未被承认。具体而言,使用USEALLPREDICTALL可以比使用释放内部和跨释放的UsePrepredictPost学习方法提高性能。此外,本文发现,对于释放内部的预测,分类性能的这种差异是由于两种学习方法中的类别不平衡造成的。当解决阶级失衡时,消除了学习方法之间的性能差异。我们的发现表明,应始终明确识别学习方法,并考虑其对软件缺陷主持性预测的影响。本文最后讨论了我们的研究和研究结果的潜在后果。
Software fault-proneness prediction is an active research area, with many factors affecting prediction performance extensively studied. However, the impact of the learning approach (i.e., the specifics of the data used for training and the target variable being predicted) on the prediction performance has not been studied, except for one initial work. This paper explores the effects of two learning approaches, useAllPredictAll and usePrePredictPost, on the performance of software fault-proneness prediction, both within-release and across-releases. The empirical results are based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on the classification performance. Specifically, using useAllPredictAll leads to significantly better performance than using usePrePredictPost learning approach, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, this difference in classification performance is due to different levels of class imbalance in the two learning approaches. When class imbalance is addressed, the performance difference between the learning approaches is eliminated. Our findings imply that the learning approach should always be explicitly identified and its impact on software fault-proneness prediction considered. The paper concludes with a discussion of potential consequences of our results for both research and practice.