使用线性函数近似的时间差学习的有限时间分析：尾平均和正则化

论文标题

使用线性函数近似的时间差学习的有限时间分析：尾平均和正则化

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

论文作者

Patil, Gandharv, A., Prashanth L., Nagaraj, Dheeraj, Precup, Doina

论文摘要

我们研究流行时间差异（TD）学习算法的有限时间行为与尾部平均相结合。我们在尾部大小的选择下的参数误差的参数误差中得出有限的时间界限，该选项不需要有关投影TD固定点的基础矩阵的特征值的信息。我们的分析表明，在预期和概率很高的情况下，尾部平均TD在最佳$ o \左（1/t \右）$ rate上收敛。此外，我们的边界对初始误差（偏差）的衰减率更高，这比平均所有迭代均有所改善。我们还建议并分析结合正则化的TD变体。从分析中，我们得出的结论是，TD的正则版本对于具有不良功能的问题很有用。

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal $O\left(1/t\right)$ rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation. From analysis, we conclude that the regularised version of TD is useful for problems with ill-conditioned features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题