论文标题
迈向部署高效的增强学习:下限和最佳性
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality
论文作者
论文摘要
部署效率是许多实际应用程序应用(RL)的重要标准。尽管社区的兴趣越来越大,但仍缺乏针对该问题的正式理论表述。在本文中,我们提出了一种对部署有效的RL(DE-RL)的配方,从“具有约束的优化”的角度来看:我们有兴趣探索MDP并在最小值{部署复杂性中获得近乎最佳的策略},而在每个部署中,策略在每个部署中都可以采样大量的数据。我们使用有限的马龙线性MDP作为具体的结构模型,我们通过建立信息理论下限,并提供实现最佳部署效率的算法来揭示实现部署效率的基本限制。此外,我们对DE-RL的表述是灵活的,可以作为其他实际相关设置的基础;我们将“安全的DE-RL”和“样本有效的DE-RL”作为两个例子,这可能是值得将来的研究。
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be worth future investigation.