论文标题
两者中最好的:高性能互动和批处理
Best of Both Worlds: High Performance Interactive and Batch Launching
论文作者
论文摘要
快速推出数千个工作对于有效的交互式超级计算,大数据分析和AI算法开发至关重要。每秒实现数千次发射需要硬件才能获得这些工作。本文提出了一种新型的先发制人方法,可以在MIT超级云系统上实施现场作业,允许将资源充分利用用于长期运行的批处理作业,同时仍然为交互式工作提供快速的启动。新方法将抢占和调度操作分开,并且与使用标准计划者提供的基于自动抢先抢先的能力相比,在抢占的工作计划方面的性能更快。结果表明,新方法可以以可与所需的计算资源闲置且可用时的性能相当地安排交互式作业。可以在不破坏交互式用户体验的同时,在增加整体系统利用率的情况下部署现货工作能力。
Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement spot jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long running batch jobs while still providing fast launch for interactive jobs. The new approach separates the job preemption and scheduling operations and can achieve 100 times faster performance in the scheduling of a job with preemption when compared to using the standard scheduler-provided automatic preemption-based capability. The results demonstrate that the new approach can schedule interactive jobs preemptively at a performance comparable to when the required computing resources are idle and available. The spot job capability can be deployed without disrupting the interactive user experience while increasing the overall system utilization.