论文标题
“项目气味” - 通过MLLINT分析ML项目的软件质量的体验
"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint
论文作者
论文摘要
机器学习(ML)项目在传统软件应用程序的开发和生产方面遇到了新颖的挑战,尽管确保项目的软件质量仍然适用,但既定的原则和最佳实践。虽然已经证明使用静态分析来捕获代码气味可以改善软件质量属性,但它只是软件质量难题的一小部分,尤其是在ML项目的情况下,鉴于其在开发它们的数据科学家中的额外挑战和较低的软件工程(SE)经验,因此它是软件质量的难题。我们介绍了项目气味的新颖概念,该概念将项目管理中的赤字视为对ML项目软件质量的更全面的看法。还实施了一种开源静态分析工具MLLINT,以帮助检测和减轻它们。我们的研究评估了在ING,全球银行和大型软件和数据密集型组织的工业背景下,项目气味的新颖概念。我们还研究了这些项目气味对概念验证与准备就绪的ML项目的感知重要性,以及使用静态分析工具(例如MLLINT)所感知的障碍物和益处。我们的发现表明,需要在当前开发阶段进行上下文感知的静态分析工具,同时需要用户最少的配置工作。
Machine Learning (ML) projects incur novel challenges in their development and productionisation over traditional software applications, though established principles and best practices in ensuring the project's software quality still apply. While using static analysis to catch code smells has been shown to improve software quality attributes, it is only a small piece of the software quality puzzle, especially in the case of ML projects given their additional challenges and lower degree of Software Engineering (SE) experience in the data scientists that develop them. We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality in ML projects. An open-source static analysis tool mllint was also implemented to help detect and mitigate these. Our research evaluates this novel concept of project smells in the industrial context of ING, a global bank and large software- and data-intensive organisation. We also investigate the perceived importance of these project smells for proof-of-concept versus production-ready ML projects, as well as the perceived obstructions and benefits to using static analysis tools such as mllint. Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development, while requiring minimal configuration effort from the user.