Lyapunov优化的强化学习公式：应用队列稳定性的边缘计算系统

论文标题

Lyapunov优化的强化学习公式：应用队列稳定性的边缘计算系统

A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability

论文作者

Bae, Sohee, Han, Seungyul, Sung, Youngchul

论文摘要

在本文中，考虑到基于深厚的增强学习（DRL）的方法，可以考虑在保持队列稳定性的同时最大程度地减少时间平均罚款。提供了适当的状态和行动空间的构造，以形成马尔可夫决策过程（MDP）进行优化。得出了加固学习奖励功能（RL）的队列稳定性条件。根据奖励折扣的分析和实用RL，为Lyapunov优化的基于DRL的方法提出了一类奖励功能。提出的基于DRL的方法对Lyapunov优化的方法不需要在每个时间步骤中进行复杂的优化，并且可以使用一般的非凸和不连续的惩罚函数运行。因此，它为Lyapunov优化提供了常规漂移加度（DPP）算法的替代方法。提出的基于DRL的方法应用于具有队列稳定性的边缘计算系统中的资源分配，数值结果证明了其成功的操作。

In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. A proper construction of state and action spaces is provided to form a proper Markov decision process (MDP) for the Lyapunov optimization. A condition for the reward function of reinforcement learning (RL) for queue stability is derived. Based on the analysis and practical RL with reward discounting, a class of reward functions is proposed for the DRL-based approach to the Lyapunov optimization. The proposed DRL-based approach to the Lyapunov optimization does not required complicated optimization at each time step and operates with general non-convex and discontinuous penalty functions. Hence, it provides an alternative to the conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization. The proposed DRL-based approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题