论文标题

基于对数计算的非常紧凑的嵌入式CNN处理器设计

A Very Compact Embedded CNN Processor Design Based on Logarithmic Computing

论文作者

Lu, Tsung-Ying, Chin, Hsu-Hsun, Wu, Hsin-I, Tsay, Ren-Song

论文摘要

在本文中,我们提出了一种非常紧凑的嵌入式CNN处理器设计,基于修改的对数计算方法,使用非常低的位宽度表示。我们高质量的CNN处理器可以轻松地放入边缘设备中。对于Yolov2,我们的处理电路仅使用TSMC 40 nm单元格库仅采用0.15 mm2。关键思想是将所有层的激活和权重值均匀地限制为在[-1,1]范围内,并产生低宽度对数表示。通过统一的表示,我们设计了一个可重复使用的CNN计算内核,并大大减少计算资源。所提出的方法已在许多流行的图像分类CNN模型(Alexnet,VGG16和RESNET-18/34)和对象检测模型(YOLOV2)上进行了广泛的评估。硬件实施的结果表明,我们的设计仅消耗最少的计算和存储资源,但精确度很高。该设计在FPGA上得到了彻底的验证,并且正在进行SOC集成,并有令人鼓舞的结果。凭借非常有效的资源和能源使用,我们的设计非常适合边缘计算。

In this paper, we propose a very compact embedded CNN processor design based on a modified logarithmic computing method using very low bit-width representation. Our high-quality CNN processor can easily fit into edge devices. For Yolov2, our processing circuit takes only 0.15 mm2 using TSMC 40 nm cell library. The key idea is to constrain the activation and weight values of all layers uniformly to be within the range [-1, 1] and produce low bit-width logarithmic representation. With the uniform representations, we devise a unified, reusable CNN computing kernel and significantly reduce computing resources. The proposed approach has been extensively evaluated on many popular image classification CNN models (AlexNet, VGG16, and ResNet-18/34) and object detection models (Yolov2). The hardware-implemented results show that our design consumes only minimal computing and storage resources, yet attains very high accuracy. The design is thoroughly verified on FPGAs, and the SoC integration is underway with promising results. With extremely efficient resource and energy usage, our design is excellent for edge computing purposes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源