论文标题

用于通信限制加速器的事务级模型模拟器

Transaction-level Model Simulator for Communication-Limited Accelerators

论文作者

Kim, Sunwoo, Wang, Jooho, Seo, Youngho, Lee, Sanghun, Park, Yeji, Park, Sungkyung, Park, Chester Sungchung

论文摘要

早期设计阶段的快速设计空间探索对于加速器的算法 - 架构共同设计至关重要。在这项工作中,提出了基于SystemC事务级建模(TLM)ACCTLMSIM的RTL周期循环精确的加速器模拟器,用于卷积神经网络(CNN)加速器。考虑到通信带宽,加速器模拟器跟踪加速器和DRAM之间的每次总线交易。仿真结果与Xilinx Zynq上的实现结果进行了验证。使用所提出的模拟器,可以证明通信带宽受到DRAM延迟和总线协议开销的严重影响。此外,在片上SRAM尺寸的约束下,循环瓷砖被优化,以最大程度地提高性能。此外,提出了一种新的性能估计模型,以加快设计空间探索。得益于提出的模拟器和性能估计模型,可以在几十分钟内探索数百万个建筑选择的设计空间。

Rapid design space exploration in early design stage is critical to algorithm-architecture co-design for accelerators. In this work, a pre-RTL cycle-accurate accelerator simulator based on SystemC transaction-level modeling (TLM), AccTLMSim, is proposed for convolutional neural network (CNN) accelerators. The accelerator simulator keeps track of each bus transaction between accelerator and DRAM, taking into account the communication bandwidth. The simulation results are validated against the implementation results on the Xilinx Zynq. Using the proposed simulator, it is shown that the communication bandwidth is severely affected by DRAM latency and bus protocol overhead. In addition, the loop tiling is optimized to maximize the performance under the constraint of on-chip SRAM size. Furthermore, a new performance estimation model is proposed to speed up the design space exploration. Thanks to the proposed simulator and performance estimation model, it is possible to explore a design space of millions of architectural options within a few tens of minutes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源