论文标题

Jupyter笔记本项目中的错误分析:一项实证研究

Bug Analysis in Jupyter Notebook Projects: An Empirical Study

论文作者

de Santana, Taijara Loiola, Neto, Paulo Anselmo da Mota Silveira, de Almeida, Eduardo Santana, Ahmed, Iftekhar

论文摘要

数据科学家已广泛采用了计算笔记本,例如jupyter,以编写代码来分析和可视化数据。尽管他们的采用和受欢迎程度越来越大,但从从业人员的角度来看,尚无详尽的研究来了解木星的发展挑战。本文提出了一项系统的研究,该研究通过大规模的经验研究对木星从业者面临的虫子和挑战进行了研究。我们通过Jupyter Notebook代码从105个GitHub开源项目中开采了14,740个提交。接下来,我们分析了30,416个堆栈溢出帖子,这使我们对从业者开发Jupyter Notebook项目时面临的错误有所了解。最后,我们对数据科学家进行了19次访谈,以发现有关Jupyter错误的更多详细信息,并深入了解Jupyter开发人员的挑战。我们根据我们的结果为Jupyter项目提出了一个错误分类。我们还重点介绍了错误类别,其根本原因以及Jupyter从业者面临的挑战。

Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, there has been no thorough study to understand Jupyter development challenges from the practitioners' point of view. This paper presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open-source projects with Jupyter notebook code. Next, we analyzed 30,416 Stack Overflow posts which gave us insights into bugs that practitioners face when developing Jupyter notebook projects. Finally, we conducted nineteen interviews with data scientists to uncover more details about Jupyter bugs and to gain insights into Jupyter developers' challenges. We propose a bug taxonomy for Jupyter projects based on our results. We also highlight bug categories, their root causes, and the challenges that Jupyter practitioners face.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源