凸面程序和Lyapunov的功能用于强化学习：基于价值方法的分析的统一观点

论文标题

凸面程序和Lyapunov的功能用于强化学习：基于价值方法的分析的统一观点

Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods

论文作者

Guo, Xingang, Hu, Bin

论文摘要

基于价值的方法在马尔可夫决策过程（MDP）和增强学习（RL）中起着基本作用。在本文中，我们提出了一个统一的控制理论框架，用于分析基于有价值的方法，例如价值计算（VC），价值迭代（VI）和时间差异（TD）学习（具有线性函数近似）。基于基于价值的方法和动态系统之间的固有连接，我们可以直接使用控制理论中的现有凸测试条件来得出上述基于值的方法的各种收敛结果。这些测试条件是线性编程（LP）或半芬矿编程（SDP）的形式的凸程序，并且可以求解以直接构建Lyapunov函数。我们的分析揭示了反馈控制系统与RL算法之间的一些有趣的联系。我们希望这种联系能够激发系统/控制理论与RL的交集的更多工作。

Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题