从合并框架到合并星星：使用HPX，Kokkos和Simd类型的体验

论文标题

从合并框架到合并星星：使用HPX，Kokkos和Simd类型的体验

From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types

论文作者

Daiß, Gregor, Singanaboina, Srinivas Yadav, Diehl, Patrick, Kaiser, Hartmut, Pflüger, Dirk

论文摘要

Octo-Tiger是一种用于恒星合并的大规模3D AMR代码，使用了HPX，Kokkos和显式SIMD类型的组合，旨在实现多种异构硬件的性能 - 可通用性。但是，在A64FX CPU上，我们遇到了几个缺失的作品，从而通过引起SIMD矢量化问题来阻碍性能。因此，我们将STD ::实验:: SIMD添加为与Kokkos Simd一起在Octo-Tiger的Kokkos内核中使用的选项，并进一步添加了新的SVE（可扩展向量扩展）SIMD Backend。此外，我们修改了Octo-Tiger Hydro求解器中的Kokkos内核中缺少SIMD实现。我们通过在三种不同的CPU上运行Octo-Tiger来测试我们的变化：A64FX，Intel Icelake和AMD EPYC CPU，评估SIMD速度和节点级的性能。我们在A64FX CPU上获得了良好的SIMD加速，以及其他两个CPU平台上的明显加速。但是，我们还在EPYC CPU上遇到了扩展问题。

Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger's Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger's hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题