论文标题
部分可观测时空混沌系统的无模型预测
SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs
论文作者
论文摘要
模具计算是许多应用领域(例如科学计算和图像处理)中的基本计算模式之一。尽管有令人鼓舞的研究可以加速FPGA上的模板,但缺乏自动加速框架,可以系统地探索迭代模具的空间和时间并行性,这可能是计算绑定或记忆的。在本文中,我们提出了SASA,这是现代基于HBM的FPGA上的可扩展和自动模板加速框架。 SASA将高级模板DSL和FPGA平台作为输入,自动利用基于我们准确的分析模型的最佳空间和时间并行性配置,并在TAPA高级合成C ++以及其相应的主机中生成优化的FPGA设计,并具有最佳的并行配置。与仅利用时间并行性的最新自动模具加速框架相比,SASA在基于HBM的Xilinx Alveo U280 FPGA板上的平均速度为3.74倍,高达15.73倍的加速度,用于广泛的模板内核。
Stencil computation is one of the fundamental computing patterns in many application domains such as scientific computing and image processing. While there are promising studies that accelerate stencils on FPGAs, there lacks an automated acceleration framework to systematically explore both spatial and temporal parallelisms for iterative stencils that could be either computation-bound or memory-bound. In this paper, we present SASA, a scalable and automatic stencil acceleration framework on modern HBM-based FPGAs. SASA takes the high-level stencil DSL and FPGA platform as inputs, automatically exploits the best spatial and temporal parallelism configuration based on our accurate analytical model, and generates the optimized FPGA design with the best parallelism configuration in TAPA high-level synthesis C++ as well as its corresponding host code. Compared to state-of-the-art automatic stencil acceleration framework SODA that only exploits temporal parallelism, SASA achieves an average speedup of 3.74x and up to 15.73x speedup on the HBM-based Xilinx Alveo U280 FPGA board for a wide range of stencil kernels.