论文标题

我如何学会不再担心用户可见的终点和爱MPI

How I Learned to Stop Worrying About User-Visible Endpoints and Love MPI

论文作者

Zambre, Rohit, Chandramowlishwaran, Aparna, Balaji, Pavan

论文摘要

MPI+线程已成为替代传统MPI到处的模型的替代品,以便更好地处理与其他节点资源相比,核心数量不成比例的增加。但是,MPI+线程的通信性能比各地的MPI慢100倍。 MPI用户和开发人员都应归咎于这种放缓。通常,MPI用户不会暴露逻辑通信并行性。因此,MPI库使用保守方法(例如全球关键部分)来维护MPI对MPI+线程的订购约束,从而序列化对并行网络资源的访问并损害了性能。 为了增强MP+线程的通信性能,研究人员提出了MPI端点作为MPI-3.1的用户可见扩展。 MPI端点允许单个过程在通信器中创建多个MPI等级。这可以使每个线程都具有通往网络的专用通信路径并提高性能。但是,将线程映射到端点的责任将在领域科学家身上。我们扮演魔鬼的倡导者的角色,并质疑对用户可见终点的需求。我们当然同意,专门的沟通渠道至关重要。但是,我们可以在多大程度上将这些渠道隐藏在MPI库中,而无需修改MPI标准并从而减轻用户的负担?更重要的是,通过这种抽象,我们会失去什么功能?本文通过使用虚拟通信接口(VCI)的新MPI-3.1实现来回答这些问题。 VCIS抽象的基础网络上下文。当用户通过现有的MPI机制揭露并行性时,MPI库映射了与VCI的并行性,从而使域科学家从端点中解脱出来。我们确定VCI的性能以及可见的端点以及这种抽象损害性能的情况。

MPI+threads is gaining prominence as an alternative to the traditional MPI everywhere model in order to better handle the disproportionate increase in the number of cores compared with other on-node resources. However, the communication performance of MPI+threads can be 100x slower than that of MPI everywhere. Both MPI users and developers are to blame for this slowdown. Typically, MPI users do not expose logical communication parallelism. Consequently, MPI libraries use conservative approaches, such as a global critical section, to maintain MPI's ordering constraints for MPI+threads, thus serializing access to parallel network resources and hurting performance. To enhance MP+threads' communication performance, researchers have proposed MPI Endpoints as a user-visible extension to MPI-3.1. MPI Endpoints allows a single process to create multiple MPI ranks within a communicator. This could allow each thread to have a dedicated communication path to the network and improve performance. The onus of mapping threads to endpoints, however, would then be on domain scientists. We play the role of devil's advocate and question the need for user-visible endpoints. We certainly agree that dedicated communication channels are critical. To what extent, however, can we hide these channels inside the MPI library without modifying the MPI standard and thus unburden the user? More important, what functionality would we lose through such abstraction? This paper answers these questions through a new MPI-3.1 implementation that uses virtual communication interfaces (VCIs). VCIs abstract underlying network contexts. When users expose parallelism through existing MPI mechanisms, the MPI library maps that parallelism to the VCIs, relieving domain scientists from endpoints. We identify cases where VCIs perform as well as user-visible endpoints, as well as cases where such abstraction hurts performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源