迈向部署高效的增强学习：下限和最佳性

论文标题

迈向部署高效的增强学习：下限和最佳性

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

论文作者

Huang, Jiawei, Chen, Jinglin, Zhao, Li, Qin, Tao, Jiang, Nan, Liu, Tie-Yan

论文摘要

部署效率是许多实际应用程序应用（RL）的重要标准。尽管社区的兴趣越来越大，但仍缺乏针对该问题的正式理论表述。在本文中，我们提出了一种对部署有效的RL（DE-RL）的配方，从“具有约束的优化”的角度来看：我们有兴趣探索MDP并在最小值{部署复杂性中获得近乎最佳的策略}，而在每个部署中，策略在每个部署中都可以采样大量的数据。我们使用有限的马龙线性MDP作为具体的结构模型，我们通过建立信息理论下限，并提供实现最佳部署效率的算法来揭示实现部署效率的基本限制。此外，我们对DE-RL的表述是灵活的，可以作为其他实际相关设置的基础；我们将“安全的DE-RL”和“样本有效的DE-RL”作为两个例子，这可能是值得将来的研究。

Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be worth future investigation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题