Rapidx：序列比对的高性能重新素处理内存加速器

论文标题

Rapidx：序列比对的高性能重新素处理内存加速器

RAPIDx: High-performance ReRAM Processing in-Memory Accelerator for Sequence Alignment

论文作者

Xu, Weihong, Gupta, Saransh, Moshiri, Niema, Rosing, Tajana

论文摘要

基因组序列比对是许多生物应用的核心。测序技术的进步产生大量数据，使得序列对准成为生物信息学分析中的关键瓶颈。现有的用于对齐的硬件加速器的芯片内存，昂贵的数据移动和优化的对齐算法有限。他们无法同时处理测序机产生的大量数据。在本文中，我们使用加工内存中（PIM）提出了一个基于重新拉接的加速器Rapidx，以进行序列比对。 Rapidx通过软件硬件共同设计实现了卓越的效率和性能。首先，我们提出了一种适用于PIM架构的自适应带状平行算法算法。与原始的基于动态编程的对齐相比，所提出的算法显着降低了所需的复杂性，数据位宽度和记忆足迹，而精度降解的成本可忽略不计。然后，我们提出了实现所提出算法的有效PIM架构。 Rapidx中的数据流达到了四级并行性，我们设计了RERAM中的原位对齐计算流量，与我们以前的PIM设计相比，$ 5.5 $ - $ 9.7 \ $ 9.7 \ $ 9.7 \ $ 9.7 \ $ 9.7 \ times $效率和吞吐量提高。提出的Rapidx是可重构的，可以用作集成到现有基因组分析管道中以提高序列比对或编辑距离计算的协调员。在短阅读对齐中，Rapidx分别提供了$ 131.1 \ times $和$ 46.8 \ times $ $ $ time $ thime $ thime $ time $ time $ time $ time $ time $ time $ times $ time $ times $ time $ times $ times $ times $ times $ times $ times $ times $ times $ times $ times $ times $ times $。与ASIC加速器进行长阅读一致性相比，Rapidx的性能为$ 1.8 $ - $ 2.9 \ tims $ akie $。

Genome sequence alignment is the core of many biological applications. The advancement of sequencing technologies produces a tremendous amount of data, making sequence alignment a critical bottleneck in bioinformatics analysis. The existing hardware accelerators for alignment suffer from limited on-chip memory, costly data movement, and poorly optimized alignment algorithms. They cannot afford to concurrently process the massive amount of data generated by sequencing machines. In this paper, we propose a ReRAM-based accelerator, RAPIDx, using processing in-memory (PIM) for sequence alignment. RAPIDx achieves superior efficiency and performance via software-hardware co-design. First, we propose an adaptive banded parallelism alignment algorithm suitable for PIM architecture. Compared to the original dynamic programming-based alignment, the proposed algorithm significantly reduces the required complexity, data bit width, and memory footprint at the cost of negligible accuracy degradation. Then we propose the efficient PIM architecture that implements the proposed algorithm. The data flow in RAPIDx achieves four-level parallelism and we design an in-situ alignment computation flow in ReRAM, delivering $5.5$-$9.7\times$ efficiency and throughput improvements compared to our previous PIM design, RAPID. The proposed RAPIDx is reconfigurable to serve as a co-processor integrated into existing genome analysis pipeline to boost sequence alignment or edit distance calculation. On short-read alignment, RAPIDx delivers $131.1\times$ and $46.8\times$ throughput improvements over state-of-the-art CPU and GPU libraries, respectively. As compared to ASIC accelerators for long-read alignment, the performance of RAPIDx is $1.8$-$2.9\times$ higher.

下载PDF全文

下载文献需遵守相关版权规定

论文标题