了解深度学习建议模型的培训效率

论文标题

了解深度学习建议模型的培训效率

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

论文作者

Acun, Bilge, Murphy, Matthew, Wang, Xiaodong, Nie, Jade, Wu, Carole-Jean, Hazelwood, Kim

论文摘要

GPU的使用已用于机器学习工作流程，现在被认为是许多深度学习模型的主流。同时，当培训最先进的个人推荐模型，即在我们的大规模数据中心消耗最多的计算周期数量时，GPU的使用面临各种挑战，因为既有计算密集型和记忆密集型组件。这些建议模型的GPU性能和效率在很大程度上受模型架构配置（例如密度和稀疏特征，MLP尺寸）的影响。此外，这些模型通常包含大型嵌入表，这些表不适合有限的GPU内存。本文的目的是解释使用GPU进行培训推荐模型的复杂性，影响硬件效率的因素以及从新的扩大GPU服务器设计Zion中学习。

The use of GPUs has proliferated for machine learning workflows and is now considered mainstream for many deep learning models. Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of compute cycles at our large-scale datacenters, the use of GPUs came with various challenges due to having both compute-intensive and memory-intensive components. GPU performance and efficiency of these recommendation models are largely affected by model architecture configurations such as dense and sparse features, MLP dimensions. Furthermore, these models often contain large embedding tables that do not fit into limited GPU memory. The goal of this paper is to explain the intricacies of using GPUs for training recommendation models, factors affecting hardware efficiency at scale, and learnings from a new scale-up GPU server design, Zion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题