git re-basin：合并模型模型置换对称性

论文标题

git re-basin：合并模型模型置换对称性

Git Re-Basin: Merging Models modulo Permutation Symmetries

论文作者

Ainsworth, Samuel K., Hayase, Jonathan, Srinivasa, Siddhartha

论文摘要

深度学习的成功很大程度上是由于我们能够相对轻松解决某些大规模的非凸优化问题的能力。尽管非凸优化是NP硬化，但简单的算法（通常是随机梯度下降的变体）在拟合大型神经网络的实践中具有令人惊讶的有效性。我们认为，在考虑了隐藏单元的所有可能排列对称性之后，神经网络损失景观通常包含（几乎）一个盆地。 2021年。我们引入了三种算法，以将一个模型的单元置于一个模型的单位，以使它们与参考模型保持一致，以便在重量空间中合并两个模型。这种转换产生了一组功能等效的权重集，这些权重位于参考模型附近的大约凸流盆地中。在实验上，我们证明了各种模型架构和数据集中的单个盆地现象，包括在CIFAR-10上独立训练的Resnet模型之间的第一个（据我们所知）的（据我们所知）的第一个（零）演示。此外，我们确定了与模型宽度和训练时间与模式连通性有关的有趣现象。最后，我们讨论了线性模式连接假设的缺点，包括对单盆地理论的反例。

The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes often contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. 2021. We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题