论文标题

TopSort:在基于HBM的FPGA上优化的高性能两相排序加速器

TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

论文作者

Qiao, Weikang, Guo, Licheng, Fang, Zhenman, Chang, Mau-Chung Frank, Cong, Jason

论文摘要

高带宽内存(HBM)的出现带来了新的机会,以提高对FPGA的分类加速的性能,FPGA的表现通常受到可用的离子内存储器带宽的限制。但是,设计师充分利用这种巨大的带宽是不平凡的。首先,由于所需的芯片资源使用速度的增长速度更快,因此现有的分散器设计无法直接按比例缩放,并且会依次限制分类性能。其次,设计人员需要对HBM特性有有效利用HBM带宽的深入了解。为了应对这些挑战,我们提出了Topsort,这是一种针对基于HBM的FPGA优化的新型两相分类解决方案。在第一阶段,16种合并的树木与完全利用32 HBM通道的平行工作。在第二阶段,TopSort将逻辑从第一阶段重新形成更宽的合并树,以合并第一阶段的部分分类结果。 Topsort还采用了HBM特异性优化,以减少资源开销并改善带宽利用率。 Topsort可以使用所有32 HBM通道进行多达4 GB数据,总体排序性能为15.6 GB/s。 Topsort比最先进的CPU和FPGA分选项快6.7倍和2.2倍。

The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for designers to fully utilize this immense bandwidth. First, the existing sorter designs cannot be directly scaled at the increasing rate of available off-chip bandwidth, as the required on-chip resource usage grows at a much faster rate and would bound the sorting performance in turn. Second, designers need an in-depth understanding of HBM characteristics to effectively utilize the HBM bandwidth. To tackle these challenges, we present TopSort, a novel two-phase sorting solution optimized for HBM-based FPGAs. In the first phase, 16 merge trees work in parallel to fully utilize 32 HBM channels. In the second phase, TopSort reuses the logic from phase one to form a wider merge tree to merge the partially sorted results from phase one. TopSort also adopts HBM-specific optimizations to reduce resource overhead and improve bandwidth utilization. TopSort can sort up to 4 GB data using all 32 HBM channels, with an overall sorting performance of 15.6 GB/s. TopSort is 6.7x and 2.2x faster than state-of-the-art CPU and FPGA sorters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源