论文标题
DeepNVM ++:非易失性记忆的跨层建模和优化框架
DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning
论文作者
论文摘要
与传统的SRAM相比,由于其非挥发性,较高的细胞密度和可伸缩性特征,与传统的SRAM相比,非挥发性内存(NVM)技术(例如自旋转移磁性随机接入记忆(STT-MRAM)和自旋轨道扭矩磁性随机访问记忆(SOT-MRAM)与常规SRAM相比具有显着优势。虽然先前的工作调查了NVM对通用应用的几种架构含义,但在这项工作中,我们提出了DEEPNVM ++,这是一个框架,用于表征,模型和分析GPU架构中基于NVM的CACHE,用于深度学习(DL)应用程序(DL)应用程序,通过将技术特异性电路级的模型和各种DL工作负载的实际记忆模型结合在一起。我们介绍了依赖于常规的SRAM和新兴STT-MRAM和SOT-MRAM Technologies的系统的系统的ISO容量和ISO区域性能和能量分析。在ISO容量的情况下,与常规的SRAM相比,STT-MRAM和SOT-MRAM可提供高达3.8倍和4.7倍的能量 - 延迟产品(EDP)的降低以及2.4倍和2.8倍的面积。在ISO-AREA假设下,STT-MRAM和SOT-MRAM可提供高达2倍和2.3倍的EDP降低,并且与SRAM相比,分别可容纳2.3倍和3.3倍的缓存能力。我们还执行可伸缩性分析,并表明与大型缓存能力相比,STT-MRAM和SOT-MRAM与SRAM相比实现了EDP的降低。我们的全面跨层框架在STT-/SOT-MRAM技术上进行了证明,可用于DL应用中GPU中最后一级caches的任何NVM技术的表征,建模和分析。
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While previous work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technology-specific circuit-level models and the actual memory behavior of various DL workloads. We present both iso-capacity and iso-area performance and energy analysis for systems whose last-level caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2.3x EDP reduction and accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively. We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM achieve orders of magnitude EDP reduction when compared to SRAM for large cache capacities. Our comprehensive cross-layer framework is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPUs for DL applications.