评估经过正方形损失与分类任务中跨渗透训练的神经体系结构

论文标题

评估经过正方形损失与分类任务中跨渗透训练的神经体系结构

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

论文作者

Hui, Like, Belkin, Mikhail

论文摘要

用于分类任务的现代神经体系结构是使用跨凝结损失训练的，该损失在经验上被认为优于正方形损失。在这项工作中，我们提供了证据，表明这种信念可能没有得到充分的基础。我们探索了几种主要的神经体系结构和一系列用于NLP，自动语音识别（ASR）和计算机视觉任务的标准基准数据集，以表明这些体系结构具有与文献相同的超参数设置，即使在平衡计算资源之后，在接受正方形损失的训练时，这些体系结构具有相同的性能或更好的培训。确实，我们观察到，正方形损失在主要的NLP和ASR实验中产生更好的结果。跨渗透似乎在计算机视觉任务上有略有优势。我们认为，几乎没有令人信服的经验或理论证据，这表明跨透明拷贝丢失具有明显的优势。确实，在我们的实验中，几乎所有非视觉任务的性能都可以通过切换到正方形损失来改善，有时可以显着改善。此外，正方形损失的训练似乎对初始化的随机性不太敏感。我们认为，使用方形损失进行分类的培训必须是现代深度学习的最佳实践的一部分，即与跨凝性平等地基础。

Modern neural architectures for classification tasks are trained using the cross-entropy loss, which is widely believed to be empirically superior to the square loss. In this work we provide evidence indicating that this belief may not be well-founded. We explore several major neural architectures and a range of standard benchmark datasets for NLP, automatic speech recognition (ASR) and computer vision tasks to show that these architectures, with the same hyper-parameter settings as reported in the literature, perform comparably or better when trained with the square loss, even after equalizing computational resources. Indeed, we observe that the square loss produces better results in the dominant majority of NLP and ASR experiments. Cross-entropy appears to have a slight edge on computer vision tasks. We argue that there is little compelling empirical or theoretical evidence indicating a clear-cut advantage to the cross-entropy loss. Indeed, in our experiments, performance on nearly all non-vision tasks can be improved, sometimes significantly, by switching to the square loss. Furthermore, training with square loss appears to be less sensitive to the randomness in initialization. We posit that training using the square loss for classification needs to be a part of best practices of modern deep learning on equal footing with cross-entropy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题