HCGRID：在混合计算环境中用于放射量的基于卷积的网格框架

论文标题

HCGRID：在混合计算环境中用于放射量的基于卷积的网格框架

HCGrid: A Convolution-based Gridding Framework for RadioAstronomy in Hybrid Computing Environments

论文作者

Wang, Hao, Yu, Ce, Zhang, Bo, Xiao, Jian, Luo, Qi

论文摘要

格栅操作是将非均匀的数据样本映射到均匀分布的格里德的，是射电天文数据减少过程的关键步骤之一。网格座的主链球链之一是计算性能不佳，而对于这种表现问题的典型解决方案是实施多核CPU平台。尽管这种肢体方法通常可以取得良好的效果，但是在许多情况下，由于CPU的局限性，网格的性能仍在一定程度上得到限制，因为GPU的大量单个指令，多数Data-stream操作的主要工作量组合的主要工作量，这更适合GPU，而不是CPU实施。 To meet the challenge of massivedata gridding for the modern large single-dish radio telescopes, e.g., the Five-hundred-meterAperture Spherical radio Telescope (FAST), inspired by existing multi-core CPU griddingalgorithms such as Cygrid, here we present an easy-to-install, high-performance, and open-source convolutional gridding framework, HCGrid,in CPU-GPU异质平台。通过在CPU上采用多线程来使数据搜索量，并通过使用GPU的大量并行化来加速卷积程序。为了使HCGRID成为更加适应性的解决方案，我们还提出了线程组织和粗糙的策略，以及在各种GPU体系结构下的最佳参数设置。通过多种GPU并行优化策略对计算及时性能增长的彻底分析表明，它可以在混合计算环境中带来出色的性能。

Gridding operation, which is to map non-uniform data samples onto a uniformly distributedgrid, is one of the key steps in radio astronomical data reduction process. One of the mainbottlenecks of gridding is the poor computing performance, and a typical solution for suchperformance issue is the implementation of multi-core CPU platforms. Although such amethod could usually achieve good results, in many cases, the performance of gridding is stillrestricted to an extent due to the limitations of CPU, since the main workload of gridding isa combination of a large number of single instruction, multi-data-stream operations, which ismore suitable for GPU, rather than CPU implementations. To meet the challenge of massivedata gridding for the modern large single-dish radio telescopes, e.g., the Five-hundred-meterAperture Spherical radio Telescope (FAST), inspired by existing multi-core CPU griddingalgorithms such as Cygrid, here we present an easy-to-install, high-performance, and open-source convolutional gridding framework, HCGrid,in CPU-GPU heterogeneous platforms. Itoptimises data search by employing multi-threading on CPU, and accelerates the convolutionprocess by utilising massive parallelisation of GPU. In order to make HCGrid a more adaptivesolution, we also propose the strategies of thread organisation and coarsening, as well as optimalparameter settings under various GPU architectures. A thorough analysis of computing timeand performance gain with several GPU parallel optimisation strategies show that it can leadto excellent performance in hybrid computing environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题