论文标题

部分可观测时空混沌系统的无模型预测

SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems

论文作者

Du, Jiangsu, Li, Dongsheng, Wen, Yingpeng, Jiang, Jiazhi, Huang, Dan, Liao, Xiangke, Lu, Yutong

论文摘要

新颖的人工智能(AI)技术已经加快了各种科学研究,例如宇宙学,物理学和生物信息学,不可避免地成为高性能计算(HPC)系统的重要类别。现有的AI基准倾向于自定义良好认可的AI应用程序,以便根据数据集和AI模型评估预定义问题大小下HPC系统的AI性能。由于问题大小缺乏可扩展性,因此静态AI基准可能有能力帮助了解HPC系统上AI应用程序的性能趋势,特别是在大规模系统上的科学AI应用程序。 在本文中,我们提出了一种可扩展的评估方法(SAIH),用于分析HPC系统的AI性能趋势,并扩展自定义AI应用程序的问题大小。为了启用可伸缩性,SAIH构建了一组新型机制,以增加问题的大小。随着数据和模型不断扩展,我们可以研究HPC系统上AI性能的趋势和范围,并进一步诊断系统瓶颈。为了验证我们的方法论,我们增加了宇宙学AI应用程序,以评估配备GPU作为SAIH案例研究的真实HPC系统。

Novel artificial intelligence (AI) technology has expedited various scientific research, e.g., cosmology, physics and bioinformatics, inevitably becoming a significant category of workload on high performance computing (HPC) systems. Existing AI benchmarks tend to customize well-recognized AI applications, so as to evaluate the AI performance of HPC systems under predefined problem size, in terms of datasets and AI models. Due to lack of scalability on the problem size, static AI benchmarks might be under competent to help understand the performance trend of evolving AI applications on HPC systems, in particular, the scientific AI applications on large-scale systems. In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications. To enable scalability, SAIH builds a set of novel mechanisms for augmenting problem sizes. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems, and further diagnose system bottlenecks. To verify our methodology, we augment a cosmological AI application to evaluate a real HPC system equipped with GPUs as a case study of SAIH.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源