论文标题
在高性能计算中对AI模型的无缝管理
Towards Seamless Management of AI Models in High-Performance Computing
论文作者
论文摘要
随着不同科学/工程社区中人工智能(AI)的越来越多的流行率,AI模型在各个领域之间以前所未有的规模出现。但是,鉴于软件和硬件环境的复杂性和多样性,重复使用AI工件(模型和数据集)极具挑战性,尤其是在AI驱动的科学应用程序中。建立一个生态系统以有效地运行和重复使用AI应用程序/数据集对多样化的科学和工程以及高性能计算(HPC)社区越来越重要。在本文中,我们对HPC-AI生态系统进行了创新-HPCFair,该生态系统可以使可发现,易于访问,可互操作和可重现(公平)原理进行创新。 HPCFAIR启用了AI模型/数据集的集合,使用户可以通过身份验证下载/上传AI伪像。最重要的是,我们提出的框架为用户提供了用户友好的API,可以根据需要轻松运行推理作业并自定义AI伪像。我们的结果表明,使用HPCFAIR API,用户不论AI的技术专长,都可以轻松地以最小的努力来利用AI工件来完成其任务。
With the increasing prevalence of artificial intelligence (AI) in diverse science/engineering communities, AI models emerge on an unprecedented scale among various domains. However, given the complexity and diversity of the software and hardware environments, reusing AI artifacts (models and datasets) is extremely challenging, especially with AI-driven science applications. Building an ecosystem to run and reuse AI applications/datasets at scale efficiently becomes increasingly essential for diverse science and engineering and high-performance computing (HPC) communities. In this paper, we innovate over an HPC-AI ecosystem -- HPCFair, which enables the Findable, Accessible, Interoperable, and Reproducible (FAIR) principles. HPCFair enables the collection of AI models/datasets allowing users to download/upload AI artifacts with authentications. Most importantly, our proposed framework provides user-friendly APIs for users to easily run inference jobs and customize AI artifacts to their tasks as needed. Our results show that, with HPCFair API, users irrespective of technical expertise in AI, can easily leverage AI artifacts to their tasks with minimal effort.