迈向NNGP引导的神经建筑搜索

论文标题

迈向NNGP引导的神经建筑搜索

Towards NNGP-guided Neural Architecture Search

论文作者

Park, Daniel S., Lee, Jaehoon, Peng, Daiyi, Cao, Yuan, Sohl-Dickstein, Jascha

论文摘要

宽贝叶斯神经网络的预测通过高斯过程（称为神经网络高斯过程（NNGP））来描述。 NNGP内核的分析形式以许多模型而闻名，但是计算卷积体系结构的确切内核非常昂贵。一个人可以使用有限的初始化网络通过蒙特卡洛估计来获得这些内核的有效近似值。与梯度下降训练相比，蒙特卡罗NNGP推断是拖失板的订单便宜。由于NNGP推断提供了网络体系结构的廉价衡量性能，因此我们研究了其作为神经体系结构搜索（NAS）的信号的潜力。我们计算CIFAR-10上NAS Bench 101数据集中大约423K网络的NNGP性能，并将其用途与通过缩短基于梯度的培训获得的常规性能度量进行比较。我们对Imagenet的移动神经体系结构搜索（MNA）空间中的10K随机采样网络进行了类似的分析。我们发现基于NNGP的指标的比较优势，并讨论潜在的应用。特别是，我们建议NNGP性能是一个廉价的信号，独立于从培训中获得的指标，可用于减少大型搜索空间或改善基于培训的绩效指标。

The predictions of wide Bayesian neural networks are described by a Gaussian process, known as the Neural Network Gaussian Process (NNGP). Analytic forms for NNGP kernels are known for many models, but computing the exact kernel for convolutional architectures is prohibitively expensive. One can obtain effective approximations of these kernels through Monte-Carlo estimation using finite networks at initialization. Monte-Carlo NNGP inference is orders-of-magnitude cheaper in FLOPs compared to gradient descent training when the dataset size is small. Since NNGP inference provides a cheap measure of performance of a network architecture, we investigate its potential as a signal for neural architecture search (NAS). We compute the NNGP performance of approximately 423k networks in the NAS-bench 101 dataset on CIFAR-10 and compare its utility against conventional performance measures obtained by shortened gradient-based training. We carry out a similar analysis on 10k randomly sampled networks in the mobile neural architecture search (MNAS) space for ImageNet. We discover comparative advantages of NNGP-based metrics, and discuss potential applications. In particular, we propose that NNGP performance is an inexpensive signal independent of metrics obtained from training that can either be used for reducing big search spaces, or improving training-based performance measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题