关于强化学习中表示的概括

论文标题

关于强化学习中表示的概括

On the Generalization of Representations in Reinforcement Learning

论文作者

Lan, Charline Le, Tu, Stephen, Oberman, Adam, Agarwal, Rishabh, Bellemare, Marc G.

论文摘要

在加强学习中，国家表示被拖拉地处理较大的问题空间。状态表示既可以用很少的参数近似值函数，又可以推广到新遇到的状态。可以隐式地学习它们的特征（作为神经网络的一部分）或明确学习（例如，\ citet的后继表示{Dayan1993improving}）。虽然表示形式的近似属性是相当理解的，但缺乏这些表示形式的精确表征。在这项工作中，我们解决了这一差距，并就特定状态表示产生的概括错误提供了信息限制。该界限是基于有效维度的概念，该概念衡量了在一个状态下了解该价值的程度。我们的边界适用于任何状态表示，并量化了良好概述的表示形式与近似良好的表示之间的自然张力。我们通过对文献和街机学习环境的结果进行经典表示方法的经验调查来补充我们的理论结果，并发现学习表征的概括行为是通过其有效维度很好地解释的。

In reinforcement learning, state representations are used to tractably deal with large problem spaces. State representations serve both to approximate the value function with few parameters, but also to generalize to newly encountered states. Their features may be learned implicitly (as part of a neural network) or explicitly (for example, the successor representation of \citet{dayan1993improving}). While the approximation properties of representations are reasonably well-understood, a precise characterization of how and when these representations generalize is lacking. In this work, we address this gap and provide an informative bound on the generalization error arising from a specific state representation. This bound is based on the notion of effective dimension which measures the degree to which knowing the value at one state informs the value at other states. Our bound applies to any state representation and quantifies the natural tension between representations that generalize well and those that approximate well. We complement our theoretical results with an empirical survey of classic representation learning methods from the literature and results on the Arcade Learning Environment, and find that the generalization behaviour of learned representations is well-explained by their effective dimension.

下载PDF全文

下载文献需遵守相关版权规定

论文标题