稀疏功能选择使批处理增强学习更加有效

论文标题

稀疏功能选择使批处理增强学习更加有效

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

论文作者

Hao, Botao, Duan, Yaqi, Lattimore, Tor, Szepesvári, Csaba, Wang, Mengdi

论文摘要

本文使用稀疏的线性函数近似对高维批处理增强学习（RL）进行了统计分析。当有大量候选功能时，我们的结果阐明了稀疏感知方法可以使批处理RL更有效的事实。我们首先考虑了政策策略评估问题。为了评估新的目标策略，我们分析了一种套索拟合的Q评估方法，并建立了一个有限样本误差绑定，该误差限制对环境维度没有多项式依赖性。为了减少套索偏差，我们进一步提出了一个模型选择估计器，该估计量将拟合Q评估适用于通过组套索选择的功能。在一个额外的信号强度假设下，我们得出了一个取决于差异函数的较尖锐的实例依赖性误差，该错误函数测量了目标策略的数据分布和占用度量之间的分布不匹配。此外，我们研究了用于批处理策略优化的套索拟合Q材料，并根据数据协方差的相关特征数量与限制最小特征值之间的比率建立有限样本误差。最后，我们用最小值的下限对结果进行补充，以进行批处理数据策略评估/优化，几乎与我们的上限相匹配。结果表明，具有良好条件的数据对于稀疏批处理政策学习至关重要。

This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension. To reduce the Lasso bias, we further propose a post model-selection estimator that applies fitted Q-evaluation to the features selected via group Lasso. Under an additional signal strength assumption, we derive a sharper instance-dependent error bound that depends on a divergence function measuring the distribution mismatch between the data distribution and occupancy measure of the target policy. Further, we study the Lasso fitted Q-iteration for batch policy optimization and establish a finite-sample error bound depending on the ratio between the number of relevant features and restricted minimal eigenvalue of the data's covariance. In the end, we complement the results with minimax lower bounds for batch-data policy evaluation/optimization that nearly match our upper bounds. The results suggest that having well-conditioned data is crucial for sparse batch policy learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题