Bignas：使用大型单阶段模型扩展神经体系结构搜索

论文标题

Bignas：使用大型单阶段模型扩展神经体系结构搜索

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

论文作者

Yu, Jiahui, Jin, Pengchong, Liu, Hanxiao, Bender, Gabriel, Kindermans, Pieter-Jan, Tan, Mingxing, Huang, Thomas, Song, Xiaodan, Pang, Ruoming, Le, Quoc

论文摘要

神经体系结构搜索（NAS）显示出令人鼓舞的结果，发现既准确又快速的模型。对于NAS，训练单次模型已成为一种流行的策略，可以使用一组共享权重对不同体系结构（儿童模型）的相对质量进行排名。但是，尽管一击模型权重可以有效地对不同的网络体系结构进行排名，但这些共享权重的绝对精度通常远低于从独立培训中获得的绝对精度。为了补偿，现有方法假定搜索完成后必须对重量进行重新训练，填充或以其他方式进行后处理。这些步骤显着增加了架构搜索和模型部署的计算要求和复杂性。在这项工作中，我们提出了Bignas，这种方法挑战了传统观念，即重新处理重量对于获得良好的预测准确性是必要的。如果没有额外的再培训或后处理步骤，我们就可以在ImageNet上训练一组共享权重，并使用这些权重来获取尺寸为200至1000 mflops的儿童模型。我们发现的模型家族Bignasmodels实现了从76.5％到80.9％的Top-1精确度，在该范围内超过了最先进的模型，包括有效网络和曾经是所有网络，而无需额外的重新处理或后期处理。我们提出了烧蚀性研究和分析，以进一步了解拟议的bignasmodels。

Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题