DeepNVM ++：非易失性记忆的跨层建模和优化框架

论文标题

DeepNVM ++：非易失性记忆的跨层建模和优化框架

DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning

论文作者

Inci, Ahmet, Isgenc, Mehmet Meric, Marculescu, Diana

论文摘要

与传统的SRAM相比，由于其非挥发性，较高的细胞密度和可伸缩性特征，与传统的SRAM相比，非挥发性内存（NVM）技术（例如自旋转移磁性随机接入记忆（STT-MRAM）和自旋轨道扭矩磁性随机访问记忆（SOT-MRAM）与常规SRAM相比具有显着优势。虽然先前的工作调查了NVM对通用应用的几种架构含义，但在这项工作中，我们提出了DEEPNVM ++，这是一个框架，用于表征，模型和分析GPU架构中基于NVM的CACHE，用于深度学习（DL）应用程序（DL）应用程序，通过将技术特异性电路级的模型和各种DL工作负载的实际记忆模型结合在一起。我们介绍了依赖于常规的SRAM和新兴STT-MRAM和SOT-MRAM Technologies的系统的系统的ISO容量和ISO区域性能和能量分析。在ISO容量的情况下，与常规的SRAM相比，STT-MRAM和SOT-MRAM可提供高达3.8倍和4.7倍的能量 - 延迟产品（EDP）的降低以及2.4倍和2.8倍的面积。在ISO-AREA假设下，STT-MRAM和SOT-MRAM可提供高达2倍和2.3倍的EDP降低，并且与SRAM相比，分别可容纳2.3倍和3.3倍的缓存能力。我们还执行可伸缩性分析，并表明与大型缓存能力相比，STT-MRAM和SOT-MRAM与SRAM相比实现了EDP的降低。我们的全面跨层框架在STT-/SOT-MRAM技术上进行了证明，可用于DL应用中GPU中最后一级caches的任何NVM技术的表征，建模和分析。

Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While previous work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technology-specific circuit-level models and the actual memory behavior of various DL workloads. We present both iso-capacity and iso-area performance and energy analysis for systems whose last-level caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2.3x EDP reduction and accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively. We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM achieve orders of magnitude EDP reduction when compared to SRAM for large cache capacities. Our comprehensive cross-layer framework is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPUs for DL applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题