在线控制的几何探索

论文标题

在线控制的几何探索

Geometric Exploration for Online Control

论文作者

Plevrakis, Orestis, Hazan, Elad

论文摘要

我们研究\ emph {未知}线性动力系统的控制。目的是最大程度地减少遗憾与扰动反馈控制者的类别，该机构涵盖了所有稳定的线性动力控制器。在这项工作中，我们首先考虑已知成本函数的情况，为此，我们使用$ n^3 \ sqrt {t} $设计了第一个多项式时间算法 - 遗憾的是，$ n $是状态的维度以及控制输入的维度。 $ \ sqrt {t} $ - horizon依赖性是最佳的，并且在先前最著名的$ t^{2/3} $的界面上有所改善。我们算法的主要组成部分是一种新颖的几何探索策略：我们在策略空间中自适应地构建了一系列barycentric跨度。其次，我们考虑了匪徒反馈的情况，为此，我们将第一个多项式时间算法使用$ poly（n）\ sqrt {t} $ - 遗憾的是，基于随机强盗凸出优化。

We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs. the class of disturbance-feedback-controllers, which encompasses all stabilizing linear-dynamical-controllers. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with $n^3\sqrt{T}$-regret, where $n$ is the dimension of the state plus the dimension of control input. The $\sqrt{T}$-horizon dependence is optimal, and improves upon the previous best known bound of $T^{2/3}$. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in the policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with $poly(n)\sqrt{T}$-regret, building on Stochastic Bandit Convex Optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题