论文标题
深度BSDE-ML学习及其用于无模型最佳控制的应用
Deep BSDE-ML Learning and Its Application to Model-Free Optimal Control
论文作者
论文摘要
本文介绍了一种具有可测量性损失的经过修改的深度BSDE(向后微分方程)学习方法,称为Deep BSDE-ML方法,以解决一种线性分离的向前偏移的随机微分方程(FBSDE),这在学习的最佳反馈策略策略策略策略中遇到了Stochostic Contractorative Cropers的最佳反馈政策。可测量性损失是通过BSDE状态在正向初始时间的测量性来表征的,这与已知深BSDE方法的终端状态不同。尽管证明两个损耗函数的最小值相等,但事实证明,这种可测量性损失等于BSDE的真实扩散项与其近似值之间的预期平方平方误差。这种关键观察扩展了深BSDE方法的应用 - 近似偏微分方程(PDE)的溶液的梯度而不是溶液本身。同时,引入了基于学习的框架来搜索确定性非线性系统的最佳反馈控制。具体而言,通过引入高斯勘探噪声,我们的目标是在此随机情况下学习一个强大的最佳控制器。这种重新制定在某种程度上牺牲了最佳性,但是正如加固学习(RL)探索噪声所建议的,对于实现无模型学习至关重要。
A modified Deep BSDE (backward differential equation) learning method with measurability loss, called Deep BSDE-ML method, is introduced in this paper to solve a kind of linear decoupled forward-backward stochastic differential equations (FBSDEs), which is encountered in the policy evaluation of learning the optimal feedback policies of a class of stochastic control problems. The measurability loss is characterized via the measurability of BSDE's state at the forward initial time, which differs from that related to terminal state of the known Deep BSDE method. Though the minima of the two loss functions are shown to be equal, this measurability loss is proved to be equal to the expected mean squared error between the true diffusion term of BSDE and its approximation. This crucial observation extends the application of the Deep BSDE method -- approximating the gradients of the solution of a partial differential equation (PDE) instead of the solution itself. Simultaneously, a learning-based framework is introduced to search an optimal feedback control of a deterministic nonlinear system. Specifically, by introducing Gaussian exploration noise, we are aiming to learn a robust optimal controller under this stochastic case. This reformulation sacrifices the optimality to some extent, but as suggested in reinforcement learning (RL) exploration noise is essential to enable the model-free learning.