论文标题
随机自适应激活功能
Stochastic Adaptive Activation Function
论文作者
论文摘要
基于激活函数的理论实现,在深层神经网络中实现了人类神经元和神经传递机制的模拟。但是,最近的研究报告说,根据单个神经元的位置和类型,神经元的阈值潜力表现出不同的值,并且激活功能在表示这种变异性方面具有局限性。因此,本研究提出了一个简单而有效的激活函数,该功能促进了不同的阈值和根据单位的位置和输入的上下文来促进不同的阈值和自适应激活。此外,所提出的激活函数在数学上表现出更普遍的Swish激活函数形式,因此我们将其表示为自适应Swish(Ash)。 Ash突出显示了在输入中以最高百分点显示出较大值的内容丰富的功能,而它会纠正低值。最重要的是,与其他激活功能相比,灰烬表现出可训练,自适应和上下文感知的特性。此外,Ash代表先前研究的激活函数的一般公式,并为出色的性能提供了合理的数学背景。为了验证ASH的有效性和鲁棒性,我们将ASH实施到许多深度学习模型中,以完成各种任务,包括分类,检测,细分和图像产生。实验分析表明,我们的激活功能可以在许多深度学习应用中提供更准确的预测和更早收敛的好处。
The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of representing this variability. Therefore, this study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs. Furthermore, the proposed activation function mathematically exhibits a more generalized form of Swish activation function, and thus we denoted it as Adaptive SwisH (ASH). ASH highlights informative features that exhibit large values in the top percentiles in an input, whereas it rectifies low values. Most importantly, ASH exhibits trainable, adaptive, and context-aware properties compared to other activation functions. Furthermore, ASH represents general formula of the previously studied activation function and provides a reasonable mathematical background for the superior performance. To validate the effectiveness and robustness of ASH, we implemented ASH into many deep learning models for various tasks, including classification, detection, segmentation, and image generation. Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.