有效的神经体系结构通过直通梯度搜索端到端的语音识别

论文标题

有效的神经体系结构通过直通梯度搜索端到端的语音识别

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

论文作者

Zheng, Huahuan, An, Keyu, Ou, Zhijian

论文摘要

神经体系结构搜索（NAS）是自动化体系结构工程的过程，是推进端到端自动语音识别（ASR）的下一步，以学习的，特定于任务的架构代替专家设计的网络。与早期的计算需求NAS方法相反，最近基于梯度的NAS方法，例如DARTS（可区分体系结构搜索），SNAS（随机NAS）和ProxylessNA，可显着提高NAS效率。在本文中，我们做出了两个贡献。首先，我们严格地通过称为ST-NAS的直通梯度（ST）梯度来开发有效的NAS方法。基本上，ST-NAS使用SNAS的损失，但使用ST通过离散变量将梯度反向梯度来优化损失，而损失未显示在副无方面。使用ST梯度来支持子图采样是实现飞镖和SNA以外有效的NAS的核心元素。其次，我们成功地将ST-NAS应用于端到端的ASR。对基准为80小时的WSJ和300小时的总机数据集进行了实验表明，ST-NAS诱导的体系结构在两个数据集上的表现明显优于人类设计的体系结构。还报道了ST-NAS的优势，例如体系结构可转让性和记忆和时间中的低计算成本。

Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast to early computational-demanding NAS methods, recent gradient-based NAS methods, e.g., DARTS (Differentiable ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly improve the NAS efficiency. In this paper, we make two contributions. First, we rigorously develop an efficient NAS method via Straight-Through (ST) gradients, called ST-NAS. Basically, ST-NAS uses the loss from SNAS but uses ST to back-propagate gradients through discrete variables to optimize the loss, which is not revealed in ProxylessNAS. Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS. Second, we successfully apply ST-NAS to end-to-end ASR. Experiments over the widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that the ST-NAS induced architectures significantly outperform the human-designed architecture across the two datasets. Strengths of ST-NAS such as architecture transferability and low computation cost in memory and time are also reported.

下载PDF全文

下载文献需遵守相关版权规定

论文标题