论文标题
通过随机近似进行渐进学习的退火优化
Annealing Optimization for Progressive Learning with Stochastic Approximation
论文作者
论文摘要
在这项工作中,我们介绍了一种学习模型,旨在满足计算资源有限的应用需求,并且优先考虑鲁棒性和解释性。学习问题可以作为约束的随机优化问题提出,其限制主要源自模型假设,这些假设定义了复杂性和性能之间的权衡。这种权衡与噪声和对抗性攻击的过度拟合,概括能力和鲁棒性密切相关,并且取决于模型的结构和复杂性,以及所使用的优化方法的属性。我们基于退火优化开发基于在线原型的学习算法,该算法被配制为无梯度随机近似算法。学习模型可以被视为一种可解释的竞争学习神经网络模型,用于监督,无监督和强化学习。该算法的退火性质有助于最小的高参数调整要求,局部最小值预防和相对于初始条件的鲁棒性。同时,它通过直观的分叉现象逐渐提高学习模型的复杂性,从而在线控制对性能复杂性权衡。最后,随机近似的使用能够通过动态系统和控制的数学工具来研究学习算法的收敛性,并允许其与增强学习算法的集成,从而构建适应性的状态行动聚合方案。
In this work, we introduce a learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized. Learning problems can be formulated as constrained stochastic optimization problems, with the constraints originating mainly from model assumptions that define a trade-off between complexity and performance. This trade-off is closely related to over-fitting, generalization capacity, and robustness to noise and adversarial attacks, and depends on both the structure and complexity of the model, as well as the properties of the optimization methods used. We develop an online prototype-based learning algorithm based on annealing optimization that is formulated as an online gradient-free stochastic approximation algorithm. The learning model can be viewed as an interpretable and progressively growing competitive-learning neural network model to be used for supervised, unsupervised, and reinforcement learning. The annealing nature of the algorithm contributes to minimal hyper-parameter tuning requirements, poor local minima prevention, and robustness with respect to the initial conditions. At the same time, it provides online control over the performance-complexity trade-off by progressively increasing the complexity of the learning model as needed, through an intuitive bifurcation phenomenon. Finally, the use of stochastic approximation enables the study of the convergence of the learning algorithm through mathematical tools from dynamical systems and control, and allows for its integration with reinforcement learning algorithms, constructing an adaptive state-action aggregation scheme.