论文标题

部分可观测时空混沌系统的无模型预测

MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using DRAM Technology

论文作者

Jeong, Minki, Jung, Wanyeong

论文摘要

基于DRAM的原位加速器已经显示出他们在解决传统冯·诺伊曼建筑的记忆墙挑战方面的潜力。此类加速器利用电荷共享或逻辑电路在DRAM子阵列级别进行简单的逻辑操作。但是,由于阵列的利用率较低,它们的吞吐量受到限制,因为DRAM阵列中只有几个行电池参与操作,而大多数行仍被停用。此外,它们需要许多循环才能进行更复杂的操作,例如多位乘积(MAC)操作,从而导致大量数据访问和移动,并可能使功率效率恶化。为了克服这些局限性,本文介绍了MAC-DO,这是一种有效且基于低功率的原位加速器。与以前的基于DRAM的原位加速器相比,一个由两个1T1C DRAM细胞(两个晶体管和两个电容器)组成的MAC-DO细胞在单个循环中天生支持多位MAC操作,确保了良好的线性性和与现有的1T1C DRAM细胞和阵列结构的兼容性。利用电荷转向的新型模拟计算方法促进了这一成就。此外,MAC-DO可以在每个MAC-DO电池中同时进行单独的MAC操作,而无需闲置细胞,从而显着提高了吞吐量和能源效率。结果,MAC-DO阵列有效地可以基于输出固定映射加速矩阵乘法,从而支持深神经网络(DNNS)中执行的大多数计算。此外,MAC-DO数组有效地重复了三种类型的数据(输入,重量和输出),从而最大程度地减少了数据移动。

DRAM-based in-situ accelerators have shown their potential in addressing the memory wall challenge of the traditional von Neumann architecture. Such accelerators exploit charge sharing or logic circuits for simple logic operations at the DRAM subarray level. However, their throughput is limited due to low array utilization, as only a few row cells in a DRAM array participate in operations while most rows remain deactivated. Moreover, they require many cycles for more complex operations such as a multi-bit multiply-accumulate (MAC) operation, resulting in significant data access and movement and potentially worsening power efficiency. To overcome these limitations, this paper presents MAC-DO, an efficient and low-power DRAM-based in-situ accelerator. Compared to previous DRAM-based in-situ accelerators, a MAC-DO cell, consisting of two 1T1C DRAM cells (two transistors and two capacitors), innately supports a multi-bit MAC operation within a single cycle, ensuring good linearity and compatibility with existing 1T1C DRAM cells and array structures. This achievement is facilitated by a novel analog computation method utilizing charge steering. Additionally, MAC-DO enables concurrent individual MAC operations in each MAC-DO cell without idle cells, significantly improving throughput and energy efficiency. As a result, a MAC-DO array efficiently can accelerate matrix multiplications based on output stationary mapping, supporting the majority of computations performed in deep neural networks (DNNs). Furthermore, a MAC-DO array efficiently reuses three types of data (input, weight and output), minimizing data movement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源