论文标题

及时:将PIM加速器中的数据运动和接口推向本地和时域

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

论文作者

Li, Weitao, Xu, Pengfei, Zhao, Yang, Li, Haitong, Xie, Yuan, Lin, Yingyan

论文摘要

基于电阻的随机访问记忆(RERAM)在内存中的处理(r $^2 $ pim)加速器在弥合事物互联网设备约束资源与卷积/深神经网络之间的差距(CNNS/DNNS')方面有希望。具体而言,R $^2 $ PIM加速器通过消除重量运动的成本并通过RERAM的高密度提高计算密度来提高能源效率。但是,能源效率仍然受到投入和部分总和(PSUM)运动的主要能源成本以及数字到ANALOG(D/A)的成本以及类似物到数字(A/D)接口的限制。 In this work, we identify three energy-saving opportunities in R$^2$PIM accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative R$^2$PIM accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both输入和PSUM运动; (2)及时及时降低了每个D/A(和A/D)转换的能量,并分别使用时间域界面(TDIS)和使用的ALB的转换总数; (3)我们开发了唯一的一方面输入读取(O $^2 $ ir)映射方法,以进一步降低输入访问的能量和D/A转换的数量。具有10多个CNN/DNN模型和各种芯片配置的评估表明,及时胜过基线R $^2 $ PIM加速器(Prime),质量为Prime,按一个数量级的能源效率订单,同时保持更好的计算密度(最多31.2 $ \ times $)和遍布(最高为736.6 $ \ $ \ times $ \ times $)。此外,还进行了全面的研究,以评估拟议的ALB,TDI和O $^2 $ IR创新的有效性,以节省和减少区域。

Resistive-random-access-memory (ReRAM) based processing-in-memory (R$^2$PIM) accelerators show promise in bridging the gap between Internet of Thing devices' constrained resources and Convolutional/Deep Neural Networks' (CNNs/DNNs') prohibitive energy cost. Specifically, R$^2$PIM accelerators enhance energy efficiency by eliminating the cost of weight movements and improving the computational density through ReRAM's high density. However, the energy efficiency is still limited by the dominant energy cost of input and partial sum (Psum) movements and the cost of digital-to-analog (D/A) and analog-to-digital (A/D) interfaces. In this work, we identify three energy-saving opportunities in R$^2$PIM accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative R$^2$PIM accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both input and Psum movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D) conversion and the total number of conversions by using time-domain interfaces (TDIs) and the employed ALBs, respectively; (3) we develop an only-once input read (O$^2$IR) mapping method to further decrease the energy of input accesses and the number of D/A conversions. The evaluation with more than 10 CNN/DNN models and various chip configurations shows that, TIMELY outperforms the baseline R$^2$PIM accelerator, PRIME, by one order of magnitude in energy efficiency while maintaining better computational density (up to 31.2$\times$) and throughput (up to 736.6$\times$). Furthermore, comprehensive studies are performed to evaluate the effectiveness of the proposed ALB, TDI, and O$^2$IR innovations in terms of energy savings and area reduction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源