使用强化学习的模型预测控制更新间隔优化

论文标题

使用强化学习的模型预测控制更新间隔优化

Optimization of the Model Predictive Control Update Interval Using Reinforcement Learning

论文作者

Bøhn, Eivind, Gros, Sebastien, Moe, Signe, Johansen, Tor Arne

论文摘要

在控制应用程序中，通常需要就控制器的复杂性和性能和可用的计算资源而做出妥协。例如，嵌入式控制应用程序中的典型硬件平台是一个微控制器，内存和处理能力有限，对于电池供电的应用程序，控制系统可以占能源消耗的很大一部分。我们提出了一个控制器体系结构，其中计算成本与控制目标一起明确优化。这是通过三部分组成的体系结构来实现的，高级，计算昂贵的控制器生成计划，计算更简单的控制器通过补偿预测错误来执行该计划，而重新组件策略则决定何时应重新计算计划。在本文中，我们采用模型预测控制（MPC）作为高级计划生成控制器，线性状态反馈控制器作为更简单的补偿控制器和加强学习（RL）来学习重新签约政策。两个示例的仿真结果展示了该体系结构改进MPC方法的能力，并找到合理的折衷，可以权衡控制目标的性能和所消耗的计算资源。

In control applications there is often a compromise that needs to be made with regards to the complexity and performance of the controller and the computational resources that are available. For instance, the typical hardware platform in embedded control applications is a microcontroller with limited memory and processing power, and for battery powered applications the control system can account for a significant portion of the energy consumption. We propose a controller architecture in which the computational cost is explicitly optimized along with the control objective. This is achieved by a three-part architecture where a high-level, computationally expensive controller generates plans, which a computationally simpler controller executes by compensating for prediction errors, while a recomputation policy decides when the plan should be recomputed. In this paper, we employ model predictive control (MPC) as the high-level plan-generating controller, a linear state feedback controller as the simpler compensating controller, and reinforcement learning (RL) to learn the recomputation policy. Simulation results for two examples showcase the architecture's ability to improve upon the MPC approach and find reasonable compromises weighing the performance on the control objective and the computational resources expended.

下载PDF全文

下载文献需遵守相关版权规定

论文标题