论文标题
机器学习系统是肿和脆弱的
Machine Learning Systems are Bloated and Vulnerable
论文作者
论文摘要
当今的软件上堆放着大多数用户未使用的代码和功能。从操作系统和应用程序到容器的整个软件堆栈中,这种膨胀很普遍。容器是用于包装代码和依赖项的轻巧虚拟化技术,可提供便携式,可重现和隔离环境。为了易用,数据科学家经常利用机器学习容器来简化其工作流程。但是,这种便利是有代价的:容器经常以不必要的代码和依赖项肿,从而产生了很大的尺寸。在本文中,我们在机器学习容器中分析和量化膨胀。我们开发了MMLB,这是一个用于分析软件系统中膨胀的框架,重点关注机器学习容器。 MMLB在容器和包装级别上测量膨胀的量,从而量化膨胀的来源。此外,MMLB与漏洞分析工具集成在一起,并执行软件包依赖性分析,以评估膨胀对容器漏洞的影响。通过从Tensorflow,Pytorch和Nvidia中对15个机器学习容器进行实验,我们表明,膨胀占机器学习容器尺寸的80%,将容器的配置时间增加370%,并使脆弱性加剧高达99%。
Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from operating systems and applications to containers. Containers are lightweight virtualization technologies used to package code and dependencies, providing portable, reproducible and isolated environments. For their ease of use, data scientists often utilize machine learning containers to simplify their workflow. However, this convenience comes at a cost: containers are often bloated with unnecessary code and dependencies, resulting in very large sizes. In this paper, we analyze and quantify bloat in machine learning containers. We develop MMLB, a framework for analyzing bloat in software systems, focusing on machine learning containers. MMLB measures the amount of bloat at both the container and package levels, quantifying the sources of bloat. In addition, MMLB integrates with vulnerability analysis tools and performs package dependency analysis to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from TensorFlow, PyTorch, and Nvidia, we show that bloat accounts for up to 80% of machine learning container sizes, increasing container provisioning times by up to 370% and exacerbating vulnerabilities by up to 99%.