论文标题
贝蒂:用于多级优化的自动分化库
Betty: An Automatic Differentiation Library for Multilevel Optimization
论文作者
论文摘要
基于梯度的多级优化(MLO)已引起人们的关注,作为研究许多问题的框架,从超参数优化和元学习到神经结构搜索和强化学习。但是,MLO中通过链条规则组成最佳响应雅各布人获得的梯度很难实施和记忆/计算。我们通过引入大型MLO的软件库Betuctuc,迈出了迈出的第一步。以此为核心,我们为MLO设计了一个新颖的数据流图,该图可以(1)为MLO开发有效的自动差异化,从而将O(d^3)从O(d^3)降低到O(d^2)的计算复杂性,((2)结合了系统支持的系统支持,例如混合级别和数据平行训练,同时允许延伸性范围和(3)促进MLO的实现,并促进MLO的实现。选择。我们从经验上证明,Betty可用于实施一系列MLO计划,同时观察到测试准确性提高了11%,GPU内存使用率下降了14%,而在多个基准上的现有实施中,训练墙时间减少了20%。我们还展示了Betty可以将MLO扩展到具有数亿个参数的模型。我们在https://github.com/leopard-ai/betty上打开代码。
Gradient-based multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and meta-learning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for large-scale MLO. At its core, we devise a novel dataflow graph for MLO, which allows us to (1) develop efficient automatic differentiation for MLO that reduces the computational complexity from O(d^3) to O(d^2), (2) incorporate systems support such as mixed-precision and data-parallel training for scalability, and (3) facilitate implementation of MLO programs of arbitrary complexity while allowing a modular interface for diverse algorithmic and systems design choices. We empirically demonstrate that Betty can be used to implement an array of MLO programs, while also observing up to 11% increase in test accuracy, 14% decrease in GPU memory usage, and 20% decrease in training wall time over existing implementations on multiple benchmarks. We also showcase that Betty enables scaling MLO to models with hundreds of millions of parameters. We open-source the code at https://github.com/leopard-ai/betty.