论文标题

批处理规范是独一无二的吗?实证研究和处方,以模拟无批量依赖性的普通正常化的最佳特性

Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence

论文作者

Rao, Vinay, Sohl-Dickstein, Jascha

论文摘要

我们对批处理规范和其他常规范围的统计特性进行了广泛的经验研究。这包括对初始化和训练过程中的小匹配,梯度规范和黑森州光谱之间的相关性的检查。通过此分析,我们确定了几种与批处理规范的出色性能相关的统计属性。我们提出了两个简单的正常化剂,即前延伸剂和regnorm,它们可以更好地匹配这些理想的属性,而无需沿批处理尺寸操作。我们表明,前预后和regnorm实现了批处理规范的大部分性能,而无需批量依赖,它们可靠地超过了分层,并且可以在批处理不效率无效的情况下应用它们。

We perform an extensive empirical study of the statistical properties of Batch Norm and other common normalizers. This includes an examination of the correlation between representations of minibatches, gradient norms, and Hessian spectra both at initialization and over the course of training. Through this analysis, we identify several statistical properties which appear linked to Batch Norm's superior performance. We propose two simple normalizers, PreLayerNorm and RegNorm, which better match these desirable properties without involving operations along the batch dimension. We show that PreLayerNorm and RegNorm achieve much of the performance of Batch Norm without requiring batch dependence, that they reliably outperform LayerNorm, and that they can be applied in situations where Batch Norm is ineffective.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源