论文标题
重新审视“定性表征神经网络优化问题”
Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"
论文作者
论文摘要
我们重新审视并扩展了Goodfellow等人的实验。 (2014年),他表明 - 对于当时的最新网络 - “目标函数具有简单,大致凸的形状”,沿着初始化和训练的权重之间的线性路径。我们认为CIFAR-10和Imagenet上的现代网络并不是这种情况。取而代之的是,尽管损失沿着这条路径大致无关,但它保持很高,直到接近最佳。此外,训练很快就通过损失障碍将训练与最佳分离。我们得出的结论是,尽管Goodfellow等人的发现描述了“相对容易优化” MNIST设置,但在现代环境中,行为在质量上有所不同。
We revisit and extend the experiments of Goodfellow et al. (2014), who showed that - for then state-of-the-art networks - "the objective function has a simple, approximately convex shape" along the linear path between initialization and the trained weights. We do not find this to be the case for modern networks on CIFAR-10 and ImageNet. Instead, although loss is roughly monotonically non-increasing along this path, it remains high until close to the optimum. In addition, training quickly becomes linearly separated from the optimum by loss barriers. We conclude that, although Goodfellow et al.'s findings describe the "relatively easy to optimize" MNIST setting, behavior is qualitatively different in modern settings.