基于压缩的优化核心GPU模型计算

论文标题

基于压缩的优化核心GPU模型计算

Compression-Based Optimizations for Out-of-Core GPU Stencil Computation

论文作者

Shen, Jingcheng, Deng, Xin, Wu, Yifan, Okita, Masao, Ino, Fumihiko

论文摘要

一个核心模型计算代码处理大型大小的大数据，其大小超出了GPU内存的容量。而这样的代码需要经常往返GPU的流数据。结果，CPU和GPU之间的数据移动通常会限制性能。在这项工作中，提出了基于压缩的优化。首先，将直接的压缩技术应用于核心内模板代码，从而减少了CPU-GPU存储器副本。其次，使用单个工作缓冲技术来减少GPU记忆消耗。实验结果表明，使用所提出的技术的模板代码在NVIDIA TESLA V100 GPU上实现了1.1倍的速度，并使GPU存储器消耗降低了33.0 \％。

An out-of-core stencil computation code handles large data whose size is beyond the capacity of GPU memory. Whereas, such an code requires streaming data to and from the GPU frequently. As a result, data movement between the CPU and GPU usually limits the performance. In this work, compression-based optimizations are proposed. First, an on-the-fly compression technique is applied to an out-of-core stencil code, reducing the CPU-GPU memory copy. Secondly, a single working buffer technique is used to reduce GPU memory consumption. Experimental results show that the stencil code using the proposed techniques achieved 1.1x speed and reduced GPU memory consumption by 33.0\% on an NVIDIA Tesla V100 GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题