FastFold：将Alphafold训练时间从11天减少到67小时

论文标题

FastFold：将Alphafold训练时间从11天减少到67小时

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

论文作者

Cheng, Shenggan, Zhao, Xuanlei, Lu, Guangyang, Fang, Jiarui, Yu, Zhongming, Zheng, Tian, Wu, Ruidong, Zhang, Xiwen, Peng, Jian, You, Yang

论文摘要

蛋白质结构预测有助于理解基因翻译和蛋白质功能，这在结构生物学中越来越重要。使用变压器结构在蛋白质结构预测中实现原子水平的准确性，Alphafold模型是一个重大突破。但是，由于其高计算和记忆成本，AlphaFold模型的培训和推断很具有挑战性。在这项工作中，我们提出了FastFold，这是对训练和推理的Alphafold的有效实施。我们提出动态轴向并行性和二元性异步操作，以提高模型并行性的缩放效率。此外，提议通过自动确定块策略，提议在推断过程中降低记忆成本超过80％。实验结果表明，FastFold将整体训练时间从11天减少到67小时，并实现7.5倍-9.5倍的速度，以进行长期推断。此外，我们将FastFold缩放到512 GPU，并以90.1％的平行效率达到6.02 Petaflop/s的骨料吞吐量。

Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of the AlphaFold model are challenging due to its high computation and memory cost. In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference. We propose Dynamic Axial Parallelism and Duality Async Operations to improve the scaling efficiency of model parallelism. Besides, AutoChunk is proposed to reduce memory cost by over 80% during inference by automatically determining the chunk strategy. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5X - 9.5X speedup for long-sequence inference. Furthermore, we scale FastFold to 512 GPUs and achieve an aggregate throughput of 6.02 PetaFLOP/s with 90.1% parallel efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题