深度学习框架中图书馆使用和依赖性的实证研究

论文标题

深度学习框架中图书馆使用和依赖性的实证研究

An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks

论文作者

aoun, Mohamed Raed El, Tidjon, Lionel Nganyewou, Rombaut, Ben, Khomh, Foutse, Hassan, Ahmed E.

论文摘要

深度学习的最新进展（DL）导致了诸如Pytorch，Caffe和Tensorflow等多个DL软件库的发布，以帮助机器学习（ML）实践者开发和部署最先进的深层神经网络（DNN），但它们无法适当地处理诸如DL库的限制或测试或数据处理或数据的限制。在本文中，我们对最频繁的DL库组合，在ML工作流程中DL库依赖的分布进行了定性和定量分析，并向（i）（i）硬件构建器提出了一组建议，以提供更优化的加速器和（ii）库构建器，以提供更多完善的未来发行版。我们的研究基于1,484个开源DL项目，根据其声誉选择了46,110个贡献者。首先，我们发现使用深度学习库的使用趋势越来越大。其次，我们重点介绍了深度学习库的几种使用模式。此外，我们确定了DL库之间的依赖性和最频繁的组合，我们发现Pytorch和Scikit-Learn以及Keras和Tensorflow是18％和14％项目中最常见的组合。开发人员在同一项目中使用两个或三个DL库，并且倾向于在同一函数和相同文件中使用不同的多个DL库。开发人员展示了使用各种深入学习库的模式，并且更喜欢简单的功能，而参数和直接目标则更少。最后，我们介绍了我们发现对研究人员，图书馆维护人员和硬件供应商的含义。

Recent advances in deep learning (dl) have led to the release of several dl software libraries such as pytorch, Caffe, and TensorFlow, in order to assist machine learning (ml) practitioners in developing and deploying state-of-the-art deep neural networks (DNN), but they are not able to properly cope with limitations in the dl libraries such as testing or data processing. In this paper, we present a qualitative and quantitative analysis of the most frequent dl libraries combination, the distribution of dl library dependencies across the ml workflow, and formulate a set of recommendations to (i) hardware builders for more optimized accelerators and (ii) library builder for more refined future releases. Our study is based on 1,484 open-source dl projects with 46,110 contributors selected based on their reputation. First, we found an increasing trend in the usage of deep learning libraries. Second, we highlight several usage patterns of deep learning libraries. In addition, we identify dependencies between dl libraries and the most frequent combination where we discover that pytorch and Scikit-learn and, Keras and TensorFlow are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files. The developer shows patterns in using various deep-learning libraries and prefers simple functions with fewer arguments and straightforward goals. Finally, we present the implications of our findings for researchers, library maintainers, and hardware vendors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题