GA-SAM：基于梯度强度的自适应清晰度感知最小化，以改善概括

论文标题

GA-SAM：基于梯度强度的自适应清晰度感知最小化，以改善概括

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

论文作者

Zhang, Zhiyuan, Luo, Ruixuan, Su, Qi, Sun, Xu

论文摘要

最近，清晰度感知最小化（SAM）算法已显示出视觉任务的最新概括能力。它表明，平坦的minima倾向于意味着更好的概括能力。但是，这在某些自然语言任务中暗示了SAM的一些困难，尤其是对于诸如RNN之类的梯度变化的模型而言。在这项工作中，我们分析了从新颖而直接的理论观点中分析局部最低限度的平坦度与其概括能力之间的关系。我们建议，训练和测试分布的转移可以等效地看作是虚拟参数损坏或扰动，这可以解释为什么针对参数损坏或扰动具有更好的概括性能的平面最小值。在其基础上，我们提出了一种基于梯度强度的自适应清晰度最小化（GA-SAM）算法，以帮助学习算法找到更概括的平坦最小值。导致各种语言基准验证了拟议的GA-SAM算法对自然语言任务的有效性。

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty implying SAM to some natural language tasks, especially to models with drastic gradient changes, such as RNNs. In this work, we analyze the relation between the flatness of the local minimum and its generalization ability from a novel and straightforward theoretical perspective. We propose that the shift of the training and test distributions can be equivalently seen as a virtual parameter corruption or perturbation, which can explain why flat minima that are robust against parameter corruptions or perturbations have better generalization performances. On its basis, we propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help to learn algorithms find flat minima that generalize better. Results in various language benchmarks validate the effectiveness of the proposed GA-SAM algorithm on natural language tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题