论文标题
使用动态重新编译优化专家的混合物
Optimizing Mixture of Experts using Dynamic Recompilations
论文作者
论文摘要
专家架构的混合物可以通过独立于计算需求(FLOP)来扩展模型参数大小来实现极大的大神经网络。但是,当前的DNN框架无法有效地支持专家混合物中的动态数据流,并且在这些框架之上的实现需要使用引入重要开销的解决方法。为了解决这些框架的局限性,我们提出了Dynyoe,这是一个DNN库,它使用动态重新编译来优化和调整计算资源的使用,以满足专家模型混合的动态需求。我们的评估表明,与现有的MOE系统相比,Dynamoe达到了1.8倍的速度,并支持2.3倍的型号,即使不使用重新补偿。然后,我们提出了进一步的优化,这是通过动态重新补偿来产生1.7倍加速的,同时降低内存压力并改善模型质量的优化。
The Mixture of Experts architecture allows for outrageously large neural networks by scaling model parameter size independently from computational demand (FLOPs). However, current DNN frameworks cannot effectively support the dynamic data flow in Mixture of Experts, and implementations on top of these frameworks need to use workarounds that introduce significant overheads. To address the limitation of these frameworks, we present DynaMoE, a DNN library that uses dynamic recompilations to optimize and adapt the use of computational resources to the dynamic needs of Mixture of Experts models. Our evaluation shows that DynaMoE achieves a 1.8x speedup and supports 2.3x larger model sizes when compared to existing MoE systems, even when not using recompilations. We then present further optimizations enabled by dynamic recompilations that yield an additional 1.7x speedup while simultaneously reducing memory pressure and improving model quality.