论文标题
级联差距:依赖差距的遗憾,以实现风险敏感的增强学习
Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning
论文作者
论文摘要
在本文中,我们研究了基于熵风险措施的风险敏感强化学习的依赖差距的遗憾。我们提出了一个新颖的次数差距定义,我们称之为级联的差距,并讨论了它们适应问题基础结构的关键组成部分。基于级联的差距,我们在情节马尔可夫决策过程中为两种无模型算法提供了非反应和对数遗憾界限。我们表明,在适当的设置中,这些范围对独立于空白的现有界限进行了指数改进。我们还证明了依赖于间隙的下限,这证明了上限的几乎最佳性。
In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.