论文标题
Delta-STN:使用结构化响应Jacobians对神经网络的有效二元优化
Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians
论文作者
论文摘要
神经网络的高参数优化可以优雅地表达为双层优化问题。尽管对神经网络的双层优化的研究一直以隐式分化和展开为主导,但由于它们能够摊销内部目标的优化,诸如自我调整网络(STN)之类的超网络(例如自我调整网络(STN))最近获得了吸引力。在本文中,我们诊断出在STN训练中诊断出几种细微的病理。基于这些观察结果,我们提出了$δ$ -STN,这是一种改进的超网络结构,可以稳定训练并优化超参数比STN更有效。关键思想是专注于准确近似最佳响应雅各布,而不是完整的最佳响应函数。我们通过重新聚集超级净收入并在当前参数周围线性化网络来实现这一目标。我们从经验上证明,与现有方法相比,我们的$δ$ -STN可以以更高的准确性,更快的收敛性和提高的稳定性来调整正则化超参数(例如重量衰减,辍学,切口孔的数量)。
Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem. While research on bilevel optimization of neural networks has been dominated by implicit differentiation and unrolling, hypernetworks such as Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective. In this paper, we diagnose several subtle pathologies in the training of STNs. Based on these observations, we propose the $Δ$-STN, an improved hypernetwork architecture which stabilizes training and optimizes hyperparameters much more efficiently than STNs. The key idea is to focus on accurately approximating the best-response Jacobian rather than the full best-response function; we achieve this by reparameterizing the hypernetwork and linearizing the network around the current parameters. We demonstrate empirically that our $Δ$-STN can tune regularization hyperparameters (e.g. weight decay, dropout, number of cutout holes) with higher accuracy, faster convergence, and improved stability compared to existing approaches.